Coronavirus is a family of viruses that are named after their spiky crown. The novel coronavirus, also known as SARS-CoV-2, is a contagious respiratory virus that first reported in Wuhan, China. On 2/11/2020, the World Health Organization designated the name COVID-19 for the disease caused by the novel coronavirus. This notebook aims at exploring COVID-19 through data analysis and projections. The world is going through a difficult time and fighting with a deadly virus called COVID-19. Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It was first identified in December 2019 in Wuhan, China, and has resulted in an ongoing pandemic. The first case may be traced back to 17 November 2019.As of 8 June 2020, more than 7.06 million cases have been reported across 188 countries and territories, resulting in more than 403,000 deaths. More than 3.16 million people have recovered.
I chose the Covid 19 data set from the following site(https://ourworldindata.org/coronavirus), and I will analyze the data, clean and perform some interesting processes and conclusions. I will strengthen the analysis and cleaning of global data. The data was downloaded from https://covid.ourworldindata.org/data/owid-covid-data.csv.
Confirmed cases and deaths: Data comes from the European Centre for Disease Prevention and Control (ECDC) Testing for COVID-19: Data is collected by the Our World in Data team from official reports; you can find the source information for every country and further details in the post on COVID-19 testing. The testing dataset is updated around twice a week. Confirmed cases and deaths: Data is collected from a variety of sources (United Nations, World Bank, Global Burden of Disease, etc.)
The information on this page is summarized from OWID's COVID-19 github page. All of Our World in Data is completely open access and all work is licensed under the Creative Commons BY license. More information about the usage of content can be found OWID github page.https://github.com/owid/covid-19-data/tree/master/public/data
OWID's COVID19 github page the data has been collected, aggregated, and documented by Diana Beltekian, Daniel Gavrilov, Joe Hasell, Bobbie Macdonald, Edouard Mathieu, Esteban Ortiz-Ospina, Hannah Ritchie, Max Roser.
Created a Linear regression model and fit the model with owid COVID19 data, predicted the world death projection for the next 30 days. In this project I have used sklearn for creating Linear Regression model and created training split with 80 to 20%. The trained the model and predicted the death for next 30 days. Also created model using XGBoost for improving the linear regression model and fit the model with owid COVID19 data, predicted the world death projection for the next 30 days.
I will create a model that can predict the risk for the Case Mortality Ratio of a Country utilizing its Life Expectancy, Percentage of Population over 65, and Percentage of diabetes_prevalence and cardiovasc_death_rate ?
It decided on using Population Over Age 65 and Obesity because in the world, over 80% of the deaths were in the population 65 and over, and the CDC has stated that 94% of deaths had some underlying health condition. We also used Life Expectency per country to account for possible deficiencies in the health care system. John Hopkins University has listed several diseases such as heart disease and Diabetes which are known to be exacerbated by Obesity. Our idea is that we can more accurately predict the Mortality Ratio of COVID-19 by using both population 65 and over and Obesity rather than just population 65 and over. This may show that creating a healthier population is the best way to prevent the devastation in future pandemics that the world is currently facing
# os to manipulate files
import os
# Importing pandas to work with DataFrames.
import pandas as pd
# Importing numpy to general methods.
import numpy as np
import time
import datetime
from datetime import datetime, date,timedelta
# Importing the matplotlib to create graphics
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.style.use('ggplot')
# Import seaborn to better the visualization
import seaborn as sns
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.express as px
# Scipy for statistics
from scipy import stats
from sklearn.metrics import mean_absolute_error,r2_score
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from scipy import integrate, optimize
# ML libraries
import lightgbm as lgb
import xgboost as xgb
from xgboost import plot_importance, plot_tree
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn import preprocessing, svm
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,explained_variance_score
import sklearn
import matplotlib.dates as dates
import seaborn as seabornInstance
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from scipy.stats import zscore
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from matplotlib import rcParams
#sns.set()
#sns.set_context('talk')
import warnings
warnings.filterwarnings('ignore')
# We'll download this file using the urlretrieve function from the urllib.request module.
from urllib.request import urlretrieve
urlretrieve('https://covid.ourworldindata.org/data/owid-covid-data.csv','owid-covid-data.csv')
('owid-covid-data.csv', <http.client.HTTPMessage at 0x205adf36f10>)
#Read data from a CSV file into a Pandas DataFrame object
world_covid19_df = pd.read_csv('owid-covid-data.csv')
owidcovidcodebook=pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-codebook.csv',index_col=0)
owidcovidcodebook
| source | description | |
|---|---|---|
| column | ||
| iso_code | International Organization for Standardization | ISO 3166-1 alpha-3 – three-letter country codes |
| continent | Our World in Data | Continent of the geographical location |
| location | Our World in Data | Geographical location |
| date | Our World in Data | Date of observation |
| total_cases | COVID-19 Data Repository by the Center for Sys... | Total confirmed cases of COVID-19 |
| new_cases | COVID-19 Data Repository by the Center for Sys... | New confirmed cases of COVID-19 |
| new_cases_smoothed | COVID-19 Data Repository by the Center for Sys... | New confirmed cases of COVID-19 (7-day smoothed) |
| total_deaths | COVID-19 Data Repository by the Center for Sys... | Total deaths attributed to COVID-19 |
| new_deaths | COVID-19 Data Repository by the Center for Sys... | New deaths attributed to COVID-19 |
| new_deaths_smoothed | COVID-19 Data Repository by the Center for Sys... | New deaths attributed to COVID-19 (7-day smoot... |
| total_cases_per_million | COVID-19 Data Repository by the Center for Sys... | Total confirmed cases of COVID-19 per 1,000,00... |
| new_cases_per_million | COVID-19 Data Repository by the Center for Sys... | New confirmed cases of COVID-19 per 1,000,000 ... |
| new_cases_smoothed_per_million | COVID-19 Data Repository by the Center for Sys... | New confirmed cases of COVID-19 (7-day smoothe... |
| total_deaths_per_million | COVID-19 Data Repository by the Center for Sys... | Total deaths attributed to COVID-19 per 1,000,... |
| new_deaths_per_million | COVID-19 Data Repository by the Center for Sys... | New deaths attributed to COVID-19 per 1,000,00... |
| new_deaths_smoothed_per_million | COVID-19 Data Repository by the Center for Sys... | New deaths attributed to COVID-19 (7-day smoot... |
| reproduction_rate | Arroyo Marioli et al. (2020). https://doi.org/... | Real-time estimate of the effective reproducti... |
| icu_patients | European CDC for European countries / UK Gover... | Number of COVID-19 patients in intensive care ... |
| icu_patients_per_million | European CDC for European countries / UK Gover... | Number of COVID-19 patients in intensive care ... |
| hosp_patients | European CDC for European countries / UK Gover... | Number of COVID-19 patients in hospital on a g... |
| hosp_patients_per_million | European CDC for European countries / UK Gover... | Number of COVID-19 patients in hospital on a g... |
| weekly_icu_admissions | European CDC for European countries / UK Gover... | Number of COVID-19 patients newly admitted to ... |
| weekly_icu_admissions_per_million | European CDC for European countries / UK Gover... | Number of COVID-19 patients newly admitted to ... |
| weekly_hosp_admissions | European CDC for European countries / UK Gover... | Number of COVID-19 patients newly admitted to ... |
| weekly_hosp_admissions_per_million | European CDC for European countries / UK Gover... | Number of COVID-19 patients newly admitted to ... |
| total_tests | National government reports | Total tests for COVID-19 |
| new_tests | National government reports | New tests for COVID-19 (only calculated for co... |
| total_tests_per_thousand | National government reports | Total tests for COVID-19 per 1,000 people |
| new_tests_per_thousand | National government reports | New tests for COVID-19 per 1,000 people |
| new_tests_smoothed | National government reports | New tests for COVID-19 (7-day smoothed). For c... |
| new_tests_smoothed_per_thousand | National government reports | New tests for COVID-19 (7-day smoothed) per 1,... |
| positive_rate | National government reports | The share of COVID-19 tests that are positive,... |
| tests_per_case | National government reports | Tests conducted per new confirmed case of COVI... |
| tests_units | National government reports | Units used by the location to report its testi... |
| total_vaccinations | National government reports | Total number of COVID-19 vaccination doses adm... |
| people_vaccinated | National government reports | Total number of people who received at least o... |
| people_fully_vaccinated | National government reports | Total number of people who received all doses ... |
| new_vaccinations | National government reports | New COVID-19 vaccination doses administered (o... |
| new_vaccinations_smoothed | National government reports | New COVID-19 vaccination doses administered (7... |
| total_vaccinations_per_hundred | National government reports | Total number of COVID-19 vaccination doses adm... |
| people_vaccinated_per_hundred | National government reports | Total number of people who received at least o... |
| people_fully_vaccinated_per_hundred | National government reports | Total number of people who received all doses ... |
| new_vaccinations_smoothed_per_million | National government reports | New COVID-19 vaccination doses administered (7... |
| stringency_index | Oxford COVID-19 Government Response Tracker, B... | Government Response Stringency Index: composit... |
| population | United Nations, Department of Economic and Soc... | Population in 2020 |
| population_density | World Bank World Development Indicators, sourc... | Number of people divided by land area, measure... |
| median_age | UN Population Division, World Population Prosp... | Median age of the population, UN projection fo... |
| aged_65_older | World Bank World Development Indicators based ... | Share of the population that is 65 years and o... |
| aged_70_older | United Nations, Department of Economic and Soc... | Share of the population that is 70 years and o... |
| gdp_per_capita | World Bank World Development Indicators, sourc... | Gross domestic product at purchasing power par... |
| extreme_poverty | World Bank World Development Indicators, sourc... | Share of the population living in extreme pove... |
| cardiovasc_death_rate | Global Burden of Disease Collaborative Network... | Death rate from cardiovascular disease in 2017... |
| diabetes_prevalence | World Bank World Development Indicators, sourc... | Diabetes prevalence (% of population aged 20 t... |
| female_smokers | World Bank World Development Indicators, sourc... | Share of women who smoke, most recent year ava... |
| male_smokers | World Bank World Development Indicators, sourc... | Share of men who smoke, most recent year avail... |
| handwashing_facilities | United Nations Statistics Division | Share of the population with basic handwashing... |
| hospital_beds_per_thousand | OECD, Eurostat, World Bank, national governmen... | Hospital beds per 1,000 people, most recent ye... |
| life_expectancy | James C. Riley, Clio Infra, United Nations Pop... | Life expectancy at birth in 2019 |
| human_development_index | United Nations Development Programme (UNDP) | A composite index measuring average achievemen... |
world_covid19_df
| iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | ... | 1803.987 | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 |
| 1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | 1803.987 | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 |
| 2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | 1803.987 | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 |
| 3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | 1803.987 | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 |
| 4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | NaN | NaN | NaN | NaN | ... | 1803.987 | NaN | 597.029 | 9.59 | NaN | NaN | 37.746 | 0.5 | 64.83 | 0.511 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 76210 | ZWE | Africa | Zimbabwe | 2021-03-16 | 36535.0 | 31.0 | 30.571 | 1507.0 | 3.0 | 2.571 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76211 | ZWE | Africa | Zimbabwe | 2021-03-17 | 36552.0 | 17.0 | 30.143 | 1508.0 | 1.0 | 2.714 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76212 | ZWE | Africa | Zimbabwe | 2021-03-18 | 36611.0 | 59.0 | 33.429 | 1509.0 | 1.0 | 2.429 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76213 | ZWE | Africa | Zimbabwe | 2021-03-19 | 36652.0 | 41.0 | 32.714 | 1510.0 | 1.0 | 2.000 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76214 | ZWE | Africa | Zimbabwe | 2021-03-20 | 36662.0 | 10.0 | 27.286 | 1510.0 | 0.0 | 1.286 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
76215 rows × 59 columns
Data from the file is read and stored in a DataFrame object - one of the core data structures in Pandas for storing and working with tabular data. We typically use the _df suffix in the variable names for dataframes.
type(world_covid19_df)
pandas.core.frame.DataFrame
#Get the number of rows & columns as a tuple
world_covid19_df.shape
(76215, 59)
#View basic infomation about rows, columns & data types
world_covid19_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 76215 entries, 0 to 76214 Data columns (total 59 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 iso_code 76215 non-null object 1 continent 72473 non-null object 2 location 76215 non-null object 3 date 76215 non-null object 4 total_cases 74913 non-null float64 5 new_cases 74911 non-null float64 6 new_cases_smoothed 73910 non-null float64 7 total_deaths 65624 non-null float64 8 new_deaths 65782 non-null float64 9 new_deaths_smoothed 73910 non-null float64 10 total_cases_per_million 74505 non-null float64 11 new_cases_per_million 74503 non-null float64 12 new_cases_smoothed_per_million 73507 non-null float64 13 total_deaths_per_million 65229 non-null float64 14 new_deaths_per_million 65387 non-null float64 15 new_deaths_smoothed_per_million 73507 non-null float64 16 reproduction_rate 61298 non-null float64 17 icu_patients 7930 non-null float64 18 icu_patients_per_million 7930 non-null float64 19 hosp_patients 9550 non-null float64 20 hosp_patients_per_million 9550 non-null float64 21 weekly_icu_admissions 697 non-null float64 22 weekly_icu_admissions_per_million 697 non-null float64 23 weekly_hosp_admissions 1224 non-null float64 24 weekly_hosp_admissions_per_million 1224 non-null float64 25 new_tests 34582 non-null float64 26 total_tests 34364 non-null float64 27 total_tests_per_thousand 34364 non-null float64 28 new_tests_per_thousand 34582 non-null float64 29 new_tests_smoothed 39573 non-null float64 30 new_tests_smoothed_per_thousand 39573 non-null float64 31 positive_rate 38264 non-null float64 32 tests_per_case 37653 non-null float64 33 tests_units 40920 non-null object 34 total_vaccinations 4978 non-null float64 35 people_vaccinated 4471 non-null float64 36 people_fully_vaccinated 2996 non-null float64 37 new_vaccinations 4246 non-null float64 38 new_vaccinations_smoothed 7571 non-null float64 39 total_vaccinations_per_hundred 4978 non-null float64 40 people_vaccinated_per_hundred 4471 non-null float64 41 people_fully_vaccinated_per_hundred 2996 non-null float64 42 new_vaccinations_smoothed_per_million 7571 non-null float64 43 stringency_index 65065 non-null float64 44 population 75798 non-null float64 45 population_density 71134 non-null float64 46 median_age 69108 non-null float64 47 aged_65_older 68314 non-null float64 48 aged_70_older 68719 non-null float64 49 gdp_per_capita 69348 non-null float64 50 extreme_poverty 47317 non-null float64 51 cardiovasc_death_rate 69965 non-null float64 52 diabetes_prevalence 70872 non-null float64 53 female_smokers 54974 non-null float64 54 male_smokers 54166 non-null float64 55 handwashing_facilities 35112 non-null float64 56 hospital_beds_per_thousand 64009 non-null float64 57 life_expectancy 72418 non-null float64 58 human_development_index 69909 non-null float64 dtypes: float64(54), object(5) memory usage: 34.3+ MB
For now, let's assume this was indeed a data entry error. We can use one of the following approaches for dealing with the missing or faulty value:
It is not really logical to delete Nan values but replace with 0, because that would confirm that the result was static because the data is historical and adopts high time series, we cannot replace or delete even the most data in the rows because it is data historical
I'd rather copy from the list than from Pandas Profiling
# ets first handle numerical features with nan value
Numerical_feat = [feature for feature in world_covid19_df.columns if world_covid19_df[feature].dtypes != 'O']
print('Total numerical features: ', len(Numerical_feat))
print('\nNumerical Features: ', Numerical_feat)
Total numerical features: 54 Numerical Features: ['total_cases', 'new_cases', 'new_cases_smoothed', 'total_deaths', 'new_deaths', 'new_deaths_smoothed', 'total_cases_per_million', 'new_cases_per_million', 'new_cases_smoothed_per_million', 'total_deaths_per_million', 'new_deaths_per_million', 'new_deaths_smoothed_per_million', 'reproduction_rate', 'icu_patients', 'icu_patients_per_million', 'hosp_patients', 'hosp_patients_per_million', 'weekly_icu_admissions', 'weekly_icu_admissions_per_million', 'weekly_hosp_admissions', 'weekly_hosp_admissions_per_million', 'new_tests', 'total_tests', 'total_tests_per_thousand', 'new_tests_per_thousand', 'new_tests_smoothed', 'new_tests_smoothed_per_thousand', 'positive_rate', 'tests_per_case', 'total_vaccinations', 'people_vaccinated', 'people_fully_vaccinated', 'new_vaccinations', 'new_vaccinations_smoothed', 'total_vaccinations_per_hundred', 'people_vaccinated_per_hundred', 'people_fully_vaccinated_per_hundred', 'new_vaccinations_smoothed_per_million', 'stringency_index', 'population', 'population_density', 'median_age', 'aged_65_older', 'aged_70_older', 'gdp_per_capita', 'extreme_poverty', 'cardiovasc_death_rate', 'diabetes_prevalence', 'female_smokers', 'male_smokers', 'handwashing_facilities', 'hospital_beds_per_thousand', 'life_expectancy', 'human_development_index']
# categorical features
categorical_feat = [feature for feature in world_covid19_df.columns if world_covid19_df[feature].dtypes=='O']
print('Total categorical features: ', len(categorical_feat))
print('\n',categorical_feat)
Total categorical features: 5 ['iso_code', 'continent', 'location', 'date', 'tests_units']
## Replacing the numerical Missing Values
for feature in Numerical_feat:
## We will replace by using median since there are outliers
world_covid19_df[feature].fillna(0,inplace=True)
world_covid19_df[Numerical_feat].isnull().sum()
total_cases 0 new_cases 0 new_cases_smoothed 0 total_deaths 0 new_deaths 0 new_deaths_smoothed 0 total_cases_per_million 0 new_cases_per_million 0 new_cases_smoothed_per_million 0 total_deaths_per_million 0 new_deaths_per_million 0 new_deaths_smoothed_per_million 0 reproduction_rate 0 icu_patients 0 icu_patients_per_million 0 hosp_patients 0 hosp_patients_per_million 0 weekly_icu_admissions 0 weekly_icu_admissions_per_million 0 weekly_hosp_admissions 0 weekly_hosp_admissions_per_million 0 new_tests 0 total_tests 0 total_tests_per_thousand 0 new_tests_per_thousand 0 new_tests_smoothed 0 new_tests_smoothed_per_thousand 0 positive_rate 0 tests_per_case 0 total_vaccinations 0 people_vaccinated 0 people_fully_vaccinated 0 new_vaccinations 0 new_vaccinations_smoothed 0 total_vaccinations_per_hundred 0 people_vaccinated_per_hundred 0 people_fully_vaccinated_per_hundred 0 new_vaccinations_smoothed_per_million 0 stringency_index 0 population 0 population_density 0 median_age 0 aged_65_older 0 aged_70_older 0 gdp_per_capita 0 extreme_poverty 0 cardiovasc_death_rate 0 diabetes_prevalence 0 female_smokers 0 male_smokers 0 handwashing_facilities 0 hospital_beds_per_thousand 0 life_expectancy 0 human_development_index 0 dtype: int64
world_covid19_df
| iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | 0.000 | 0.0 | 0.0 | 0.000 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.000 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.000 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.000 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.000 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 76210 | ZWE | Africa | Zimbabwe | 2021-03-16 | 36535.0 | 31.0 | 30.571 | 1507.0 | 3.0 | 2.571 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76211 | ZWE | Africa | Zimbabwe | 2021-03-17 | 36552.0 | 17.0 | 30.143 | 1508.0 | 1.0 | 2.714 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76212 | ZWE | Africa | Zimbabwe | 2021-03-18 | 36611.0 | 59.0 | 33.429 | 1509.0 | 1.0 | 2.429 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76213 | ZWE | Africa | Zimbabwe | 2021-03-19 | 36652.0 | 41.0 | 32.714 | 1510.0 | 1.0 | 2.000 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
| 76214 | ZWE | Africa | Zimbabwe | 2021-03-20 | 36662.0 | 10.0 | 27.286 | 1510.0 | 0.0 | 1.286 | ... | 1899.775 | 21.4 | 307.846 | 1.82 | 1.6 | 30.7 | 36.791 | 1.7 | 61.49 | 0.571 |
76215 rows × 59 columns
#Store the clean DataFrame in a CSV file
world_covid19_df.to_csv('covid19_df_master.csv',index=False)
covid_df=pd.read_csv('covid19_df_master.csv')
#covid_df.hist(figsize=(15,15));
It appears that each column contains values of a specific data type. For the numeric columns, you can view the some statistical information like mean, standard deviation, minimum/maximum values and number of non-empty values using the .describe method
covid_df.describe().style.background_gradient(cmap="CMRmap_r")
| total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | total_cases_per_million | new_cases_per_million | new_cases_smoothed_per_million | total_deaths_per_million | new_deaths_per_million | new_deaths_smoothed_per_million | reproduction_rate | icu_patients | icu_patients_per_million | hosp_patients | hosp_patients_per_million | weekly_icu_admissions | weekly_icu_admissions_per_million | weekly_hosp_admissions | weekly_hosp_admissions_per_million | new_tests | total_tests | total_tests_per_thousand | new_tests_per_thousand | new_tests_smoothed | new_tests_smoothed_per_thousand | positive_rate | tests_per_case | total_vaccinations | people_vaccinated | people_fully_vaccinated | new_vaccinations | new_vaccinations_smoothed | total_vaccinations_per_hundred | people_vaccinated_per_hundred | people_fully_vaccinated_per_hundred | new_vaccinations_smoothed_per_million | stringency_index | population | population_density | median_age | aged_65_older | aged_70_older | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 | 76215.000000 |
| mean | 664429.456551 | 5129.566568 | 5063.112630 | 17076.098655 | 114.393531 | 113.178646 | 7873.626153 | 66.505456 | 65.390387 | 160.782550 | 1.209663 | 1.194381 | 0.817522 | 104.170898 | 2.427559 | 581.768497 | 19.617366 | 2.352516 | 0.175219 | 60.790989 | 1.757699 | 17638.245293 | 2187700.430243 | 80.015992 | 0.720085 | 19339.450148 | 0.782154 | 0.044136 | 80.619000 | 503611.229863 | 314467.331864 | 100983.102303 | 16332.660697 | 16596.512301 | 0.524075 | 0.339373 | 0.126228 | 243.715712 | 50.208742 | 129789029.585436 | 312.872929 | 27.700474 | 7.880594 | 5.023743 | 17411.789829 | 8.249080 | 236.141847 | 7.253194 | 7.601972 | 23.205725 | 23.489918 | 2.548185 | 69.505214 | 0.667534 |
| std | 4698196.905872 | 32191.869926 | 31633.387699 | 108136.089442 | 674.709265 | 652.256822 | 15662.046239 | 168.372816 | 141.179154 | 315.421600 | 3.623517 | 2.756024 | 0.509057 | 1038.799768 | 10.996649 | 4734.364944 | 85.827497 | 57.094248 | 3.765821 | 1554.711563 | 32.119539 | 102929.078224 | 15326886.721775 | 265.393666 | 3.099126 | 100759.298246 | 2.652978 | 0.082895 | 624.528191 | 7953373.735704 | 4766946.098613 | 1766444.270360 | 256803.603832 | 218893.084722 | 4.590345 | 2.888190 | 1.552936 | 1490.438593 | 29.046831 | 694541376.973729 | 1579.306203 | 12.419133 | 6.482936 | 4.370634 | 19609.485356 | 16.968226 | 133.738715 | 4.294111 | 10.021965 | 18.657876 | 33.329193 | 2.519296 | 17.534691 | 0.246698 |
| min | 0.000000 | -74347.000000 | -6223.000000 | 0.000000 | -1918.000000 | -232.143000 | 0.000000 | -2153.437000 | -276.825000 | 0.000000 | -76.445000 | -10.921000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -239172.000000 | 0.000000 | 0.000000 | -23.010000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 673.500000 | 1.000000 | 4.429000 | 10.000000 | 0.000000 | 0.000000 | 123.578000 | 0.061000 | 0.741000 | 1.603000 | 0.000000 | 0.000000 | 0.520000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 30.560000 | 2540916.000000 | 24.282000 | 19.600000 | 3.008000 | 1.783000 | 2896.913000 | 0.000000 | 140.448000 | 4.610000 | 0.000000 | 0.000000 | 0.000000 | 0.700000 | 66.470000 | 0.555000 |
| 50% | 7821.000000 | 55.000000 | 63.000000 | 130.000000 | 1.000000 | 0.857000 | 1036.897000 | 5.861000 | 7.536000 | 19.611000 | 0.019000 | 0.099000 | 0.960000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 89.000000 | 0.007000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 55.560000 | 10099270.000000 | 80.080000 | 29.000000 | 5.440000 | 3.212000 | 10727.146000 | 0.500000 | 233.070000 | 6.930000 | 2.100000 | 24.500000 | 0.000000 | 2.000000 | 74.160000 | 0.737000 |
| 75% | 87043.000000 | 662.000000 | 666.785500 | 1710.000000 | 11.000000 | 11.571000 | 7189.766500 | 55.940000 | 61.936000 | 139.170000 | 0.804000 | 0.975000 | 1.130000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 3895.000000 | 378435.000000 | 30.874500 | 0.375000 | 5638.500000 | 0.534000 | 0.052000 | 18.200000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 73.150000 | 36910558.000000 | 204.430000 | 38.000000 | 13.260000 | 8.353000 | 25063.846000 | 4.500000 | 318.949000 | 9.750000 | 13.000000 | 37.700000 | 47.782000 | 3.600000 | 78.490000 | 0.828000 |
| max | 122813796.000000 | 880902.000000 | 739564.429000 | 2709639.000000 | 17895.000000 | 14424.000000 | 148592.506000 | 8652.658000 | 2648.773000 | 2327.774000 | 218.329000 | 63.140000 | 6.740000 | 30028.000000 | 189.561000 | 129812.000000 | 1042.535000 | 4037.019000 | 276.325000 | 116385.000000 | 2656.911000 | 2945871.000000 | 355058178.000000 | 3857.663000 | 327.086000 | 1858135.000000 | 59.929000 | 0.742000 | 44258.700000 | 436370147.000000 | 258958639.000000 | 99942889.000000 | 16650022.000000 | 11248913.000000 | 155.330000 | 89.160000 | 66.170000 | 54264.000000 | 100.000000 | 7794798729.000000 | 20546.766000 | 48.200000 | 27.049000 | 18.493000 | 116935.600000 | 77.600000 | 724.417000 | 30.530000 | 44.000000 | 78.100000 | 98.999000 | 13.800000 | 86.750000 | 0.957000 |
While we ahve looked at overall numbers for the cases, tests, positive rate etc., it would be also be useful to study these numbers on a month-by-month basis. The date column might come in handy here, as Pandas provides many utilities for working with dates.
#covid_df['date'] = pd.to_datetime(covid_df.date)
You can see that it now has the datatype datetime64. We can now extract different parts of the data into separate columns, using the DatetimeIndex class
#covid_df['year'] = pd.DatetimeIndex(covid_df.date).year
#covid_df['month'] = pd.DatetimeIndex(covid_df.date).month
#covid_df['day'] = pd.DatetimeIndex(covid_df.date).day
#covid_df['weekday'] = pd.DatetimeIndex(covid_df.date).weekday
covid_df.head(10)
| iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 5 | AFG | Asia | Afghanistan | 2020-02-29 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 6 | AFG | Asia | Afghanistan | 2020-03-01 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 7 | AFG | Asia | Afghanistan | 2020-03-02 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 8 | AFG | Asia | Afghanistan | 2020-03-03 | 2.0 | 1.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
| 9 | AFG | Asia | Afghanistan | 2020-03-04 | 4.0 | 2.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 1803.987 | 0.0 | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 |
10 rows × 59 columns
sum(covid_df.duplicated())
0
covid_df.isnull().sum()
iso_code 0 continent 3742 location 0 date 0 total_cases 0 new_cases 0 new_cases_smoothed 0 total_deaths 0 new_deaths 0 new_deaths_smoothed 0 total_cases_per_million 0 new_cases_per_million 0 new_cases_smoothed_per_million 0 total_deaths_per_million 0 new_deaths_per_million 0 new_deaths_smoothed_per_million 0 reproduction_rate 0 icu_patients 0 icu_patients_per_million 0 hosp_patients 0 hosp_patients_per_million 0 weekly_icu_admissions 0 weekly_icu_admissions_per_million 0 weekly_hosp_admissions 0 weekly_hosp_admissions_per_million 0 new_tests 0 total_tests 0 total_tests_per_thousand 0 new_tests_per_thousand 0 new_tests_smoothed 0 new_tests_smoothed_per_thousand 0 positive_rate 0 tests_per_case 0 tests_units 35295 total_vaccinations 0 people_vaccinated 0 people_fully_vaccinated 0 new_vaccinations 0 new_vaccinations_smoothed 0 total_vaccinations_per_hundred 0 people_vaccinated_per_hundred 0 people_fully_vaccinated_per_hundred 0 new_vaccinations_smoothed_per_million 0 stringency_index 0 population 0 population_density 0 median_age 0 aged_65_older 0 aged_70_older 0 gdp_per_capita 0 extreme_poverty 0 cardiovasc_death_rate 0 diabetes_prevalence 0 female_smokers 0 male_smokers 0 handwashing_facilities 0 hospital_beds_per_thousand 0 life_expectancy 0 human_development_index 0 dtype: int64
covid_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 76215 entries, 0 to 76214 Data columns (total 59 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 iso_code 76215 non-null object 1 continent 72473 non-null object 2 location 76215 non-null object 3 date 76215 non-null object 4 total_cases 76215 non-null float64 5 new_cases 76215 non-null float64 6 new_cases_smoothed 76215 non-null float64 7 total_deaths 76215 non-null float64 8 new_deaths 76215 non-null float64 9 new_deaths_smoothed 76215 non-null float64 10 total_cases_per_million 76215 non-null float64 11 new_cases_per_million 76215 non-null float64 12 new_cases_smoothed_per_million 76215 non-null float64 13 total_deaths_per_million 76215 non-null float64 14 new_deaths_per_million 76215 non-null float64 15 new_deaths_smoothed_per_million 76215 non-null float64 16 reproduction_rate 76215 non-null float64 17 icu_patients 76215 non-null float64 18 icu_patients_per_million 76215 non-null float64 19 hosp_patients 76215 non-null float64 20 hosp_patients_per_million 76215 non-null float64 21 weekly_icu_admissions 76215 non-null float64 22 weekly_icu_admissions_per_million 76215 non-null float64 23 weekly_hosp_admissions 76215 non-null float64 24 weekly_hosp_admissions_per_million 76215 non-null float64 25 new_tests 76215 non-null float64 26 total_tests 76215 non-null float64 27 total_tests_per_thousand 76215 non-null float64 28 new_tests_per_thousand 76215 non-null float64 29 new_tests_smoothed 76215 non-null float64 30 new_tests_smoothed_per_thousand 76215 non-null float64 31 positive_rate 76215 non-null float64 32 tests_per_case 76215 non-null float64 33 tests_units 40920 non-null object 34 total_vaccinations 76215 non-null float64 35 people_vaccinated 76215 non-null float64 36 people_fully_vaccinated 76215 non-null float64 37 new_vaccinations 76215 non-null float64 38 new_vaccinations_smoothed 76215 non-null float64 39 total_vaccinations_per_hundred 76215 non-null float64 40 people_vaccinated_per_hundred 76215 non-null float64 41 people_fully_vaccinated_per_hundred 76215 non-null float64 42 new_vaccinations_smoothed_per_million 76215 non-null float64 43 stringency_index 76215 non-null float64 44 population 76215 non-null float64 45 population_density 76215 non-null float64 46 median_age 76215 non-null float64 47 aged_65_older 76215 non-null float64 48 aged_70_older 76215 non-null float64 49 gdp_per_capita 76215 non-null float64 50 extreme_poverty 76215 non-null float64 51 cardiovasc_death_rate 76215 non-null float64 52 diabetes_prevalence 76215 non-null float64 53 female_smokers 76215 non-null float64 54 male_smokers 76215 non-null float64 55 handwashing_facilities 76215 non-null float64 56 hospital_beds_per_thousand 76215 non-null float64 57 life_expectancy 76215 non-null float64 58 human_development_index 76215 non-null float64 dtypes: float64(54), object(5) memory usage: 34.3+ MB
covid_df.describe().T.style.background_gradient(cmap="CMRmap_r")
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| total_cases | 76215.000000 | 664429.456551 | 4698196.905872 | 0.000000 | 673.500000 | 7821.000000 | 87043.000000 | 122813796.000000 |
| new_cases | 76215.000000 | 5129.566568 | 32191.869926 | -74347.000000 | 1.000000 | 55.000000 | 662.000000 | 880902.000000 |
| new_cases_smoothed | 76215.000000 | 5063.112630 | 31633.387699 | -6223.000000 | 4.429000 | 63.000000 | 666.785500 | 739564.429000 |
| total_deaths | 76215.000000 | 17076.098655 | 108136.089442 | 0.000000 | 10.000000 | 130.000000 | 1710.000000 | 2709639.000000 |
| new_deaths | 76215.000000 | 114.393531 | 674.709265 | -1918.000000 | 0.000000 | 1.000000 | 11.000000 | 17895.000000 |
| new_deaths_smoothed | 76215.000000 | 113.178646 | 652.256822 | -232.143000 | 0.000000 | 0.857000 | 11.571000 | 14424.000000 |
| total_cases_per_million | 76215.000000 | 7873.626153 | 15662.046239 | 0.000000 | 123.578000 | 1036.897000 | 7189.766500 | 148592.506000 |
| new_cases_per_million | 76215.000000 | 66.505456 | 168.372816 | -2153.437000 | 0.061000 | 5.861000 | 55.940000 | 8652.658000 |
| new_cases_smoothed_per_million | 76215.000000 | 65.390387 | 141.179154 | -276.825000 | 0.741000 | 7.536000 | 61.936000 | 2648.773000 |
| total_deaths_per_million | 76215.000000 | 160.782550 | 315.421600 | 0.000000 | 1.603000 | 19.611000 | 139.170000 | 2327.774000 |
| new_deaths_per_million | 76215.000000 | 1.209663 | 3.623517 | -76.445000 | 0.000000 | 0.019000 | 0.804000 | 218.329000 |
| new_deaths_smoothed_per_million | 76215.000000 | 1.194381 | 2.756024 | -10.921000 | 0.000000 | 0.099000 | 0.975000 | 63.140000 |
| reproduction_rate | 76215.000000 | 0.817522 | 0.509057 | 0.000000 | 0.520000 | 0.960000 | 1.130000 | 6.740000 |
| icu_patients | 76215.000000 | 104.170898 | 1038.799768 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 30028.000000 |
| icu_patients_per_million | 76215.000000 | 2.427559 | 10.996649 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 189.561000 |
| hosp_patients | 76215.000000 | 581.768497 | 4734.364944 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 129812.000000 |
| hosp_patients_per_million | 76215.000000 | 19.617366 | 85.827497 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1042.535000 |
| weekly_icu_admissions | 76215.000000 | 2.352516 | 57.094248 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4037.019000 |
| weekly_icu_admissions_per_million | 76215.000000 | 0.175219 | 3.765821 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 276.325000 |
| weekly_hosp_admissions | 76215.000000 | 60.790989 | 1554.711563 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 116385.000000 |
| weekly_hosp_admissions_per_million | 76215.000000 | 1.757699 | 32.119539 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2656.911000 |
| new_tests | 76215.000000 | 17638.245293 | 102929.078224 | -239172.000000 | 0.000000 | 0.000000 | 3895.000000 | 2945871.000000 |
| total_tests | 76215.000000 | 2187700.430243 | 15326886.721775 | 0.000000 | 0.000000 | 0.000000 | 378435.000000 | 355058178.000000 |
| total_tests_per_thousand | 76215.000000 | 80.015992 | 265.393666 | 0.000000 | 0.000000 | 0.000000 | 30.874500 | 3857.663000 |
| new_tests_per_thousand | 76215.000000 | 0.720085 | 3.099126 | -23.010000 | 0.000000 | 0.000000 | 0.375000 | 327.086000 |
| new_tests_smoothed | 76215.000000 | 19339.450148 | 100759.298246 | 0.000000 | 0.000000 | 89.000000 | 5638.500000 | 1858135.000000 |
| new_tests_smoothed_per_thousand | 76215.000000 | 0.782154 | 2.652978 | 0.000000 | 0.000000 | 0.007000 | 0.534000 | 59.929000 |
| positive_rate | 76215.000000 | 0.044136 | 0.082895 | 0.000000 | 0.000000 | 0.000000 | 0.052000 | 0.742000 |
| tests_per_case | 76215.000000 | 80.619000 | 624.528191 | 0.000000 | 0.000000 | 0.000000 | 18.200000 | 44258.700000 |
| total_vaccinations | 76215.000000 | 503611.229863 | 7953373.735704 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 436370147.000000 |
| people_vaccinated | 76215.000000 | 314467.331864 | 4766946.098613 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 258958639.000000 |
| people_fully_vaccinated | 76215.000000 | 100983.102303 | 1766444.270360 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 99942889.000000 |
| new_vaccinations | 76215.000000 | 16332.660697 | 256803.603832 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 16650022.000000 |
| new_vaccinations_smoothed | 76215.000000 | 16596.512301 | 218893.084722 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 11248913.000000 |
| total_vaccinations_per_hundred | 76215.000000 | 0.524075 | 4.590345 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 155.330000 |
| people_vaccinated_per_hundred | 76215.000000 | 0.339373 | 2.888190 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 89.160000 |
| people_fully_vaccinated_per_hundred | 76215.000000 | 0.126228 | 1.552936 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 66.170000 |
| new_vaccinations_smoothed_per_million | 76215.000000 | 243.715712 | 1490.438593 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 54264.000000 |
| stringency_index | 76215.000000 | 50.208742 | 29.046831 | 0.000000 | 30.560000 | 55.560000 | 73.150000 | 100.000000 |
| population | 76215.000000 | 129789029.585436 | 694541376.973729 | 0.000000 | 2540916.000000 | 10099270.000000 | 36910558.000000 | 7794798729.000000 |
| population_density | 76215.000000 | 312.872929 | 1579.306203 | 0.000000 | 24.282000 | 80.080000 | 204.430000 | 20546.766000 |
| median_age | 76215.000000 | 27.700474 | 12.419133 | 0.000000 | 19.600000 | 29.000000 | 38.000000 | 48.200000 |
| aged_65_older | 76215.000000 | 7.880594 | 6.482936 | 0.000000 | 3.008000 | 5.440000 | 13.260000 | 27.049000 |
| aged_70_older | 76215.000000 | 5.023743 | 4.370634 | 0.000000 | 1.783000 | 3.212000 | 8.353000 | 18.493000 |
| gdp_per_capita | 76215.000000 | 17411.789829 | 19609.485356 | 0.000000 | 2896.913000 | 10727.146000 | 25063.846000 | 116935.600000 |
| extreme_poverty | 76215.000000 | 8.249080 | 16.968226 | 0.000000 | 0.000000 | 0.500000 | 4.500000 | 77.600000 |
| cardiovasc_death_rate | 76215.000000 | 236.141847 | 133.738715 | 0.000000 | 140.448000 | 233.070000 | 318.949000 | 724.417000 |
| diabetes_prevalence | 76215.000000 | 7.253194 | 4.294111 | 0.000000 | 4.610000 | 6.930000 | 9.750000 | 30.530000 |
| female_smokers | 76215.000000 | 7.601972 | 10.021965 | 0.000000 | 0.000000 | 2.100000 | 13.000000 | 44.000000 |
| male_smokers | 76215.000000 | 23.205725 | 18.657876 | 0.000000 | 0.000000 | 24.500000 | 37.700000 | 78.100000 |
| handwashing_facilities | 76215.000000 | 23.489918 | 33.329193 | 0.000000 | 0.000000 | 0.000000 | 47.782000 | 98.999000 |
| hospital_beds_per_thousand | 76215.000000 | 2.548185 | 2.519296 | 0.000000 | 0.700000 | 2.000000 | 3.600000 | 13.800000 |
| life_expectancy | 76215.000000 | 69.505214 | 17.534691 | 0.000000 | 66.470000 | 74.160000 | 78.490000 | 86.750000 |
| human_development_index | 76215.000000 | 0.667534 | 0.246698 | 0.000000 | 0.555000 | 0.737000 | 0.828000 | 0.957000 |
data_popu=covid_df.groupby('continent').sum()
plt.figure(figsize = (20,18))
sns.set_style('ticks')
#sum countries population in Asia
plt.subplot(221)
sns.barplot(y='location', x='population', data=covid_df[covid_df['continent'] == 'Asia']).set_title('sum countries population in Asia')
#sum countries population in North America
plt.subplot(222)
sns.barplot(y='location', x='population', data=covid_df[covid_df['continent'] == 'North America']).set_title('sum countries population in North America')
#sum countries population in South America
plt.subplot(223)
sns.barplot(y='location', x='population',data=covid_df[covid_df['continent'] == 'South America']).set_title('sum countries population in South America')
#sum countries population in Europe
plt.subplot(224)
sns.set_style('ticks')
sns.barplot(y='location', x='population', data=covid_df[covid_df['continent'] == 'Europe']).set_title('sum countries population in Europe')
plt.subplots_adjust(left=0.1,
bottom=0.1,
right=0.9,
top=0.9,
wspace=0.4,
hspace=0.2)
plt.show();
#sum countries population in Africa
plt.figure(figsize = (20,18))
plt.subplot(221)
sns.set_style('ticks')
#sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.barplot(y='location', x='population', data=covid_df[covid_df['continent'] == 'Africa']).set_title('sum countries population in Africa')
#sum countries population in Oceania
plt.subplot(222)
sns.barplot(y='location', x='population', data=covid_df[covid_df['continent'] == 'Oceania']).set_title('sum countries population in Oceania')
plt.subplots_adjust(left=0.1,
bottom=0.1,
right=0.9,
top=0.9,
wspace=0.4,
hspace=0.2)
plt.show();
#top countries total population in asia
top_popu_asia=covid_df[covid_df['continent'] == 'Asia']
print("The top 10 countries total population in the continent of Asia is :\n",top_popu_asia.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of Asia is :
continent location
Asia China 1.439324e+09
India 1.380004e+09
Indonesia 2.735236e+08
Pakistan 2.208923e+08
Bangladesh 1.646894e+08
Japan 1.264765e+08
Philippines 1.095811e+08
Vietnam 9.733858e+07
Turkey 8.433907e+07
Iran 8.399295e+07
Name: population, dtype: float64
#top countries total population in North America
top_popu_north_america=covid_df[covid_df['continent'] == 'North America']
print("The top 10 countries total population in the continent of North America is :\n",top_popu_north_america.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of North America is :
continent location
North America United States 331002647.0
Mexico 128932753.0
Canada 37742157.0
Guatemala 17915567.0
Haiti 11402533.0
Cuba 11326616.0
Dominican Republic 10847904.0
Honduras 9904608.0
Nicaragua 6624554.0
El Salvador 6486201.0
Name: population, dtype: float64
#top countries total population in South America
top_popu_south_america=covid_df[covid_df['continent'] == 'South America']
print("The top 10 countries total population in the continent of South America is :\n",top_popu_south_america.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of South America is :
continent location
South America Brazil 212559409.0
Colombia 50882884.0
Argentina 45195777.0
Peru 32971846.0
Venezuela 28435943.0
Chile 19116209.0
Ecuador 17643060.0
Bolivia 11673029.0
Paraguay 7132530.0
Uruguay 3473727.0
Name: population, dtype: float64
#top countries total population in Europe
top_popu_europe=covid_df[covid_df['continent'] == 'Europe']
print("The top 10 countries total population in the continent of Europe is :\n",top_popu_europe.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of Europe is :
continent location
Europe Russia 145934460.0
Germany 83783945.0
France 68147687.0
United Kingdom 67886004.0
Italy 60461828.0
Spain 46754783.0
Ukraine 43733759.0
Poland 37846605.0
Romania 19237682.0
Netherlands 17134873.0
Name: population, dtype: float64
##Show two or more countries total population numbers the min
data=covid_df[covid_df['continent'] == 'Europe']
data.groupby(['continent','location'])['population'].min().nsmallest(10)
continent location
Europe Vatican 809.0
Gibraltar 33691.0
San Marino 33938.0
Liechtenstein 38137.0
Monaco 39244.0
Faeroe Islands 48865.0
Guernsey 67052.0
Andorra 77265.0
Isle of Man 85032.0
Jersey 101073.0
Name: population, dtype: float64
#top countries total population in Europe
top_popu_africa=covid_df[covid_df['continent'] == 'Africa']
print(" The top 10 countries total population in the continent of Africa is :\n",top_popu_africa.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of Africa is :
continent location
Africa Nigeria 206139587.0
Ethiopia 114963583.0
Egypt 102334403.0
Democratic Republic of Congo 89561404.0
Tanzania 59734213.0
South Africa 59308690.0
Kenya 53771300.0
Uganda 45741000.0
Algeria 43851043.0
Sudan 43849269.0
Name: population, dtype: float64
#top 10 countries total population in Oceania
top_popu_oceania=covid_df[covid_df['continent'] == 'Oceania']
print(" The top 10 countries total population in the continent of Oceania is :\n",top_popu_oceania.groupby(['continent','location'])['population'].max().nlargest(10))
The top 10 countries total population in the continent of Oceania is :
continent location
Oceania Australia 25499881.0
Papua New Guinea 8947027.0
New Zealand 4822233.0
Fiji 896444.0
Solomon Islands 686878.0
Vanuatu 307150.0
Samoa 198410.0
Micronesia (country) 115021.0
Marshall Islands 59194.0
Name: population, dtype: float64
continent_populations_df = covid_df.groupby(['continent'])['population'].sum()
continent_populations_df
continent Africa 5.072744e+11 Asia 1.890449e+12 Europe 3.055237e+11 North America 2.476494e+11 Oceania 1.641354e+10 South America 1.701631e+11 Name: population, dtype: float64
#Show countries in asia the total_cases and total_deaths,new_cases,total_tests, numbers mean, and max
data_total=covid_df[covid_df['continent'] == 'Asia']
data_total.groupby(['continent','location']).agg({'total_cases': ['mean','max'],'total_deaths':['mean','max'],'total_tests':['mean','max'],'total_vaccinations':['mean','max']}).style.background_gradient(cmap="CMRmap_r")
| total_cases | total_deaths | total_tests | total_vaccinations | ||||||
|---|---|---|---|---|---|---|---|---|---|
| mean | max | mean | max | mean | max | mean | max | ||
| continent | location | ||||||||
| Asia | Afghanistan | 32800.465473 | 56093.000000 | 1243.191816 | 2462.000000 | 0.000000 | 0.000000 | 159.079284 | 54000.000000 |
| Armenia | 72477.093506 | 183127.000000 | 1282.075325 | 3332.000000 | 167847.475325 | 788953.000000 | 0.000000 | 0.000000 | |
| Azerbaijan | 81129.828571 | 245490.000000 | 1060.303896 | 3339.000000 | 0.000000 | 0.000000 | 8624.532468 | 453586.000000 | |
| Bahrain | 55488.445013 | 135326.000000 | 202.920716 | 498.000000 | 1134590.184143 | 3368947.000000 | 24745.964194 | 640104.000000 | |
| Bangladesh | 290780.796345 | 568706.000000 | 4216.869452 | 8668.000000 | 1760640.963446 | 4349615.000000 | 166798.676240 | 4760747.000000 | |
| Bhutan | 316.507895 | 869.000000 | 0.189474 | 1.000000 | 159224.386842 | 570591.000000 | 0.000000 | 0.000000 | |
| Brunei | 147.522546 | 205.000000 | 2.477454 | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Cambodia | 276.434368 | 1578.000000 | 0.028640 | 2.000000 | 0.000000 | 0.000000 | 3161.455847 | 170659.000000 | |
| China | 85702.589623 | 101518.000000 | 4236.117925 | 4839.000000 | 589622.641509 | 160000000.000000 | 816007.075472 | 70000000.000000 | |
| Georgia | 76535.095116 | 277218.000000 | 885.956298 | 3691.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Hong Kong | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 532269.370000 | 9918161.000000 | 30933.000000 | 330600.000000 | |
| India | 4672001.079327 | 11599130.000000 | 70708.079327 | 159755.000000 | 72672294.795673 | 231370546.000000 | 1949145.658654 | 44603841.000000 | |
| Indonesia | 392300.888021 | 1455788.000000 | 12082.231771 | 39447.000000 | 1865629.171875 | 7781193.000000 | 317826.398438 | 7835357.000000 | |
| Iran | 595969.260101 | 1793805.000000 | 27403.255051 | 61724.000000 | 2319587.229798 | 11844528.000000 | 32.828283 | 10000.000000 | |
| Iraq | 298074.859335 | 789390.000000 | 6821.565217 | 13969.000000 | 1884651.910486 | 7498360.000000 | 0.000000 | 0.000000 | |
| Israel | 239729.043038 | 827220.000000 | 1853.668354 | 6082.000000 | 4214791.891139 | 14437280.000000 | 1153293.881013 | 9686464.000000 | |
| Japan | 115264.412736 | 455212.000000 | 2036.846698 | 8802.000000 | 2125313.466981 | 8633325.000000 | 7521.330189 | 578835.000000 | |
| Jordan | 111707.718016 | 526666.000000 | 1381.872063 | 5788.000000 | 1220804.791123 | 5317747.000000 | 1947.830287 | 241868.000000 | |
| Kazakhstan | 120345.053619 | 281798.000000 | 1556.005362 | 3201.000000 | 3126088.214477 | 8211056.000000 | 627.857909 | 109995.000000 | |
| Kuwait | 89033.342711 | 217933.000000 | 544.427110 | 1215.000000 | 724549.557545 | 1941949.000000 | 1464.194373 | 360000.000000 | |
| Kyrgyzstan | 43689.866848 | 87389.000000 | 862.141304 | 1498.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Laos | 27.541436 | 49.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 112.519337 | 40732.000000 | |
| Lebanon | 92094.444162 | 436575.000000 | 943.406091 | 5715.000000 | 0.000000 | 0.000000 | 4564.992386 | 135349.000000 | |
| Macao | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 9632.441176 | 27637.000000 | |
| Malaysia | 59239.719715 | 331713.000000 | 282.427553 | 1229.000000 | 1793750.712589 | 7029970.000000 | 9899.733967 | 399525.000000 | |
| Maldives | 8346.089947 | 22373.000000 | 28.335979 | 65.000000 | 171012.134921 | 588650.000000 | 9865.481481 | 212711.000000 | |
| Mongolia | 749.364362 | 4806.000000 | 0.473404 | 4.000000 | 296730.375000 | 2061669.000000 | 3560.569149 | 204121.000000 | |
| Myanmar | 49369.986072 | 142212.000000 | 1093.337047 | 3204.000000 | 595327.573816 | 2482290.000000 | 1069.080780 | 380000.000000 | |
| Nepal | 101298.733967 | 275829.000000 | 735.420428 | 3016.000000 | 831554.337292 | 2218722.000000 | 5401.249406 | 1600000.000000 | |
| Northern Cyprus | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1774.222222 | 11000.000000 | |
| Oman | 76451.038363 | 149135.000000 | 758.416880 | 1620.000000 | 0.000000 | 0.000000 | 4577.020460 | 109844.000000 | |
| Pakistan | 283551.843590 | 626802.000000 | 5985.046154 | 13843.000000 | 3156972.805128 | 9691087.000000 | 1289.430769 | 350000.000000 | |
| Palestine | 58333.060367 | 221391.000000 | 590.986877 | 2406.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Philippines | 232215.197115 | 656056.000000 | 4641.959135 | 12930.000000 | 3012351.947115 | 8938938.000000 | 1554.935096 | 240297.000000 | |
| Qatar | 100689.639896 | 173206.000000 | 158.971503 | 272.000000 | 698088.658031 | 1648555.000000 | 3645.077720 | 510000.000000 | |
| Saudi Arabia | 245337.190104 | 384653.000000 | 3584.747396 | 6602.000000 | 5915013.812500 | 14503622.000000 | 106634.174479 | 2999798.000000 | |
| Singapore | 40674.288416 | 60184.000000 | 21.498818 | 30.000000 | 373765.257683 | 8055714.000000 | 9986.940898 | 792423.000000 | |
| South Korea | 29387.498824 | 98665.000000 | 513.983529 | 1696.000000 | 2012652.098824 | 7176600.000000 | 21106.602353 | 676900.000000 | |
| Sri Lanka | 18167.811456 | 89655.000000 | 92.408115 | 544.000000 | 541971.038186 | 2309954.000000 | 48432.090692 | 824523.000000 | |
| Syria | 5446.060440 | 17240.000000 | 324.148352 | 1153.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Taiwan | 505.732558 | 1005.000000 | 6.051163 | 10.000000 | 85052.120930 | 183386.000000 | 0.000000 | 0.000000 | |
| Tajikistan | 9405.175385 | 13308.000000 | 71.316923 | 90.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Thailand | 5812.058824 | 27594.000000 | 49.151584 | 90.000000 | 864887.124434 | 2894666.000000 | 215.031674 | 53842.000000 | |
| Timor | 40.447802 | 271.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Turkey | 835822.845333 | 2992694.000000 | 11075.733333 | 29959.000000 | 12276750.498667 | 35787480.000000 | 982927.338667 | 13029754.000000 | |
| United Arab Emirates | 115633.733813 | 438638.000000 | 428.280576 | 1433.000000 | 10221291.400480 | 34913667.000000 | 768939.309353 | 7181056.000000 | |
| Uzbekistan | 42940.285714 | 81339.000000 | 335.913747 | 622.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Vietnam | 897.810875 | 2572.000000 | 18.139480 | 35.000000 | 92152.125296 | 1469955.000000 | 353.342790 | 30971.000000 | |
| Yemen | 1629.878261 | 3278.000000 | 460.168116 | 737.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
#Show countries in Europe the total_cases and total_deaths,new_cases,total_tests, numbers mean, and max
data_total=covid_df[covid_df['continent'] == 'Europe']
data_total.groupby(['continent','location']).agg({'total_cases': ['mean','max'],'total_deaths':['mean','max'],'total_tests':['mean','max'],'total_vaccinations':['mean','max']}).style.background_gradient(cmap="CMRmap_r")
| total_cases | total_deaths | total_tests | total_vaccinations | ||||||
|---|---|---|---|---|---|---|---|---|---|
| mean | max | mean | max | mean | max | mean | max | ||
| continent | location | ||||||||
| Europe | Albania | 27496.753846 | 120541.000000 | 542.628205 | 2133.000000 | 121225.482051 | 497742.000000 | 358.548718 | 33369.000000 |
| Andorra | 3850.583333 | 11481.000000 | 61.265625 | 113.000000 | 4918.734375 | 162071.000000 | 52.361979 | 4914.000000 | |
| Austria | 143926.628205 | 511440.000000 | 2512.548718 | 9052.000000 | 3093094.464103 | 19277527.000000 | 88327.330769 | 1239208.000000 | |
| Belarus | 105512.888889 | 309293.000000 | 812.855297 | 2148.000000 | 526215.829457 | 5178695.000000 | 131.638243 | 30000.000000 | |
| Belgium | 271334.396594 | 827941.000000 | 11233.172749 | 22650.000000 | 3408751.282238 | 10322890.000000 | 100065.007299 | 1323086.000000 | |
| Bosnia and Herzegovina | 47326.703412 | 151337.000000 | 1662.503937 | 5773.000000 | 213201.230971 | 733032.000000 | 0.000000 | 0.000000 | |
| Bulgaria | 78287.687831 | 302480.000000 | 2912.796296 | 11966.000000 | 529062.817460 | 1915561.000000 | 25370.275132 | 366547.000000 | |
| Croatia | 73024.387179 | 256805.000000 | 1434.812821 | 5753.000000 | 388907.628205 | 1461537.000000 | 8242.951282 | 365082.000000 | |
| Cyprus | 9437.673740 | 41882.000000 | 63.230769 | 242.000000 | 602203.002653 | 3044737.000000 | 1503.787798 | 129438.000000 | |
| Czechia | 323394.974293 | 1459406.000000 | 5140.961440 | 24530.000000 | 0.000000 | 0.000000 | 95911.832905 | 1330675.000000 | |
| Denmark | 62748.303483 | 225540.000000 | 851.870647 | 2400.000000 | 4858681.820896 | 19408441.000000 | 79063.845771 | 935185.000000 | |
| Estonia | 14311.394872 | 94028.000000 | 150.310256 | 780.000000 | 323216.153846 | 1071562.000000 | 12716.071795 | 220882.000000 | |
| Faeroe Islands | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2280.510204 | 9342.000000 | |
| Finland | 17232.170264 | 71123.000000 | 344.328537 | 805.000000 | 1115747.035971 | 3734426.000000 | 48220.817746 | 812039.000000 | |
| France | 1138253.457346 | 4277183.000000 | 36869.215640 | 92119.000000 | 0.000000 | 0.000000 | 479563.850711 | 7927771.000000 | |
| Germany | 707616.916468 | 2669233.000000 | 18247.847255 | 74706.000000 | 2368382.556086 | 47511887.000000 | 769901.868735 | 10473852.000000 | |
| Gibraltar | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 24921.085714 | 52331.000000 | |
| Greece | 55152.391858 | 235611.000000 | 1747.959288 | 7421.000000 | 1564811.786260 | 5944445.000000 | 97515.226463 | 1436491.000000 | |
| Guernsey | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2345.041667 | 29383.000000 | |
| Hungary | 123337.568063 | 560971.000000 | 3960.986911 | 18068.000000 | 1059831.206806 | 3751450.000000 | 127689.473822 | 2038133.000000 | |
| Iceland | 3295.149100 | 6097.000000 | 15.313625 | 29.000000 | 132377.637532 | 290437.000000 | 2127.336761 | 52604.000000 | |
| Ireland | 66086.012821 | 229831.000000 | 1903.846154 | 4585.000000 | 1286446.820513 | 3786972.000000 | 44057.589744 | 639586.000000 | |
| Isle of Man | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 15432.666667 | 33336.000000 | |
| Italy | 875866.281928 | 3356331.000000 | 42791.809639 | 104642.000000 | 13117156.293976 | 45894515.000000 | 586145.142169 | 7708889.000000 | |
| Jersey | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5268.440000 | 45758.000000 | |
| Kosovo | 23996.236559 | 80295.000000 | 642.766129 | 1752.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Latvia | 18431.976923 | 97149.000000 | 324.702564 | 1821.000000 | 453499.400000 | 1671987.000000 | 7445.574359 | 102320.000000 | |
| Liechtenstein | 788.968586 | 2627.000000 | 13.811518 | 56.000000 | 0.000000 | 0.000000 | 66.479058 | 4215.000000 | |
| Lithuania | 50231.503876 | 208650.000000 | 739.069767 | 3464.000000 | 841210.359173 | 2264247.000000 | 31723.829457 | 399863.000000 | |
| Luxembourg | 19337.785166 | 58955.000000 | 232.685422 | 714.000000 | 815707.519182 | 2280826.000000 | 3365.920716 | 70339.000000 | |
| Malta | 6024.591687 | 27904.000000 | 83.332518 | 369.000000 | 256224.574572 | 772206.000000 | 9488.924205 | 140331.000000 | |
| Moldova | 68266.780423 | 214203.000000 | 1542.140212 | 4531.000000 | 0.000000 | 0.000000 | 268.455026 | 18593.000000 | |
| Monaco | 507.873057 | 2173.000000 | 5.569948 | 27.000000 | 0.000000 | 0.000000 | 93.505181 | 18081.000000 | |
| Montenegro | 22491.146341 | 86782.000000 | 313.325203 | 1194.000000 | 0.000000 | 0.000000 | 140.181572 | 7298.000000 | |
| Netherlands | 337849.726343 | 1211447.000000 | 7574.644501 | 16395.000000 | 365637.345269 | 7184008.000000 | 19897.066496 | 1887726.000000 | |
| North Macedonia | 34666.272494 | 118736.000000 | 1097.457584 | 3448.000000 | 199062.809769 | 578312.000000 | 13.624679 | 5300.000000 | |
| Norway | 24624.683673 | 86362.000000 | 300.262755 | 648.000000 | 1371938.622449 | 4274069.000000 | 57095.441327 | 758514.000000 | |
| Poland | 511944.769634 | 2036700.000000 | 11805.534031 | 49159.000000 | 3690909.712042 | 10639405.000000 | 341900.761780 | 4983494.000000 | |
| Portugal | 217262.483117 | 817080.000000 | 4287.057143 | 16762.000000 | 3046625.184416 | 8671839.000000 | 107622.909091 | 1325266.000000 | |
| Romania | 260677.143959 | 892848.000000 | 7062.737789 | 22132.000000 | 1941957.696658 | 6445769.000000 | 199985.588689 | 2426191.000000 | |
| Russia | 1449070.896386 | 4397816.000000 | 26344.183133 | 93090.000000 | 39430406.373494 | 116724405.000000 | 313259.971084 | 8306498.000000 | |
| San Marino | 1303.175258 | 4356.000000 | 45.201031 | 79.000000 | 0.000000 | 0.000000 | 178.778351 | 7923.000000 | |
| Serbia | 126718.807198 | 546896.000000 | 1370.326478 | 4900.000000 | 1148183.123393 | 3218400.000000 | 135700.277635 | 2163593.000000 | |
| Slovakia | 77540.144737 | 347944.000000 | 1320.644737 | 8978.000000 | 2540943.868421 | 21061465.000000 | 57538.084211 | 718369.000000 | |
| Slovenia | 46344.532688 | 205509.000000 | 963.242131 | 3967.000000 | 407755.222760 | 2313303.000000 | 22672.167070 | 286151.000000 | |
| Spain | 982592.074879 | 3212332.000000 | 32955.731884 | 72910.000000 | 1521440.183575 | 34785710.000000 | 325315.521739 | 5993363.000000 | |
| Sweden | 186260.644928 | 744272.000000 | 5708.398551 | 13262.000000 | 0.000000 | 0.000000 | 24882.821256 | 1293923.000000 | |
| Switzerland | 167805.045024 | 580609.000000 | 3434.443128 | 10203.000000 | 1515279.417062 | 4699813.000000 | 81363.518957 | 1176875.000000 | |
| Ukraine | 444491.861619 | 1584972.000000 | 8394.133159 | 31344.000000 | 1838510.422977 | 7573737.000000 | 1884.511749 | 108310.000000 | |
| United Kingdom | 1142338.734940 | 4304839.000000 | 48199.161446 | 126359.000000 | 25552517.004819 | 107584947.000000 | 2517574.086747 | 28985958.000000 | |
| Vatican | 17.268421 | 27.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
#Show countries in Europe the total_cases and total_deaths,new_cases,total_tests, numbers mean, and max
data_total=covid_df[covid_df['continent'] == 'North America']
data_total.groupby(['continent','location']).agg({'total_cases': ['mean','max'],'total_deaths':['mean','max'],'total_tests':['mean','max'],'total_vaccinations':['mean','max']}).style.background_gradient(cmap="CMRmap_r")
| total_cases | total_deaths | total_tests | total_vaccinations | ||||||
|---|---|---|---|---|---|---|---|---|---|
| mean | max | mean | max | mean | max | mean | max | ||
| continent | location | ||||||||
| North America | Anguilla | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 417.727273 | 5348.000000 |
| Antigua and Barbuda | 159.664879 | 1033.000000 | 4.772118 | 28.000000 | 0.000000 | 0.000000 | 133.621984 | 25677.000000 | |
| Bahamas | 3864.724324 | 8800.000000 | 85.605405 | 186.000000 | 0.000000 | 0.000000 | 0.297297 | 110.000000 | |
| Barbados | 562.886179 | 3533.000000 | 9.544715 | 39.000000 | 0.000000 | 0.000000 | 3205.623306 | 58214.000000 | |
| Belize | 4180.584022 | 12400.000000 | 96.738292 | 316.000000 | 0.000000 | 0.000000 | 220.085399 | 15006.000000 | |
| Bermuda | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2816.246154 | 30481.000000 | |
| Canada | 265005.852381 | 935932.000000 | 9559.716667 | 22635.000000 | 2676569.345238 | 26250445.000000 | 261255.052381 | 3862685.000000 | |
| Cayman Islands | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4332.476190 | 39145.000000 | |
| Costa Rica | 78812.997368 | 211903.000000 | 1007.471053 | 2896.000000 | 209663.102632 | 623493.000000 | 3394.173684 | 248082.000000 | |
| Cuba | 10733.294118 | 65962.000000 | 124.740642 | 392.000000 | 774317.021390 | 2730305.000000 | 0.000000 | 0.000000 | |
| Dominica | 52.634615 | 156.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 57.052198 | 13565.000000 | |
| Dominican Republic | 99771.244156 | 248989.000000 | 1566.116883 | 3269.000000 | 427707.618182 | 1263135.000000 | 10422.646753 | 675000.000000 | |
| El Salvador | 26734.095368 | 62531.000000 | 780.168937 | 1975.000000 | 162169.149864 | 717882.000000 | 339.269755 | 41512.000000 | |
| Greenland | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 226.882353 | 5130.000000 | |
| Grenada | 54.082418 | 154.000000 | 0.211538 | 1.000000 | 0.000000 | 0.000000 | 32.285714 | 8606.000000 | |
| Guatemala | 73889.419598 | 187659.000000 | 2637.716080 | 6685.000000 | 309333.449749 | 995537.000000 | 980.753769 | 66399.000000 | |
| Haiti | 7147.516393 | 12700.000000 | 163.997268 | 251.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Honduras | 70863.101333 | 181931.000000 | 1897.346667 | 4430.000000 | 0.000000 | 0.000000 | 201.789333 | 37317.000000 | |
| Jamaica | 7204.192000 | 34665.000000 | 143.296000 | 524.000000 | 48877.738667 | 243363.000000 | 75.186667 | 16096.000000 | |
| Mexico | 670242.629213 | 2193639.000000 | 63141.078652 | 197827.000000 | 1581521.896629 | 5399163.000000 | 264364.887640 | 5459014.000000 | |
| Montserrat | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 51.096774 | 932.000000 | |
| Nicaragua | 3943.362398 | 6582.000000 | 113.700272 | 176.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| Panama | 127367.350133 | 350665.000000 | 2316.992042 | 6042.000000 | 627499.267905 | 2035210.000000 | 12984.485411 | 297165.000000 | |
| Saint Kitts and Nevis | 22.116343 | 44.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 40.119114 | 7580.000000 | |
| Saint Lucia | 501.247312 | 4113.000000 | 5.475806 | 55.000000 | 0.000000 | 0.000000 | 106.801075 | 20247.000000 | |
| Saint Vincent and the Grenadines | 269.379032 | 1696.000000 | 0.919355 | 9.000000 | 0.000000 | 0.000000 | 49.287634 | 9383.000000 | |
| Trinidad and Tobago | 3476.973118 | 7839.000000 | 63.903226 | 140.000000 | 30720.443548 | 103786.000000 | 3.846774 | 991.000000 | |
| Turks and Caicos Islands | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 214.433333 | 6433.000000 | |
| United States | 9004664.099057 | 29785285.000000 | 197045.080189 | 541926.000000 | 113629930.792453 | 355058178.000000 | 8889080.884434 | 121441497.000000 | |
Coronavirus is continuing its spread across the world with almost 100 million confirmed cases in 191 countries and more than two million deaths. and the virus has been detected in nearly every country, as these maps show.
worldwide_spread=covid_df[["continent","location","total_cases","total_tests","date","total_deaths","positive_rate","total_vaccinations","people_fully_vaccinated"]]
df=worldwide_spread.dropna(axis=0)
df.sort_values("total_tests",ascending=False)
df_loc=df.groupby(['location']).max()
df_loc.drop(["date"],axis=1,inplace=True)
df_loc
for i,r in df_loc.iterrows():
if r["total_tests"]>0:
df_loc.loc[i,"test per confirmed(%)"]=(r["total_cases"]/r["total_tests"])*100
df_covid=df_loc.reset_index()
df_covid.style.background_gradient(cmap="CMRmap_r")
| location | continent | total_cases | total_tests | total_deaths | positive_rate | total_vaccinations | people_fully_vaccinated | test per confirmed(%) | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 56093.000000 | 0.000000 | 2462.000000 | 0.000000 | 54000.000000 | 0.000000 | nan |
| 1 | Albania | Europe | 120541.000000 | 497742.000000 | 2133.000000 | 0.409000 | 33369.000000 | 655.000000 | 24.217567 |
| 2 | Algeria | Africa | 116066.000000 | 0.000000 | 3055.000000 | 0.000000 | 75000.000000 | 0.000000 | nan |
| 3 | Andorra | Europe | 11481.000000 | 162071.000000 | 113.000000 | 0.157000 | 4914.000000 | 1264.000000 | 7.083932 |
| 4 | Angola | Africa | 21696.000000 | 0.000000 | 526.000000 | 0.000000 | 49000.000000 | 0.000000 | nan |
| 5 | Anguilla | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5348.000000 | 0.000000 | nan |
| 6 | Antigua and Barbuda | North America | 1033.000000 | 0.000000 | 28.000000 | 0.000000 | 25677.000000 | 0.000000 | nan |
| 7 | Argentina | South America | 2241739.000000 | 6838489.000000 | 54517.000000 | 0.508000 | 3063864.000000 | 591438.000000 | 32.781204 |
| 8 | Armenia | Asia | 183127.000000 | 788953.000000 | 3332.000000 | 0.504000 | 0.000000 | 0.000000 | 23.211395 |
| 9 | Australia | Oceania | 29196.000000 | 15072203.000000 | 909.000000 | 0.038000 | 253831.000000 | 0.000000 | 0.193708 |
| 10 | Austria | Europe | 511440.000000 | 19277527.000000 | 9052.000000 | 0.247000 | 1239208.000000 | 311203.000000 | 2.653037 |
| 11 | Azerbaijan | Asia | 245490.000000 | 0.000000 | 3339.000000 | 0.000000 | 453586.000000 | 0.000000 | nan |
| 12 | Bahamas | North America | 8800.000000 | 0.000000 | 186.000000 | 0.000000 | 110.000000 | 0.000000 | nan |
| 13 | Bahrain | Asia | 135326.000000 | 3368947.000000 | 498.000000 | 0.079000 | 640104.000000 | 232782.000000 | 4.016863 |
| 14 | Bangladesh | Asia | 568706.000000 | 4349615.000000 | 8668.000000 | 0.243000 | 4760747.000000 | 0.000000 | 13.074858 |
| 15 | Barbados | North America | 3533.000000 | 0.000000 | 39.000000 | 0.000000 | 58214.000000 | 0.000000 | nan |
| 16 | Belarus | Europe | 309293.000000 | 5178695.000000 | 2148.000000 | 0.122000 | 30000.000000 | 10000.000000 | 5.972412 |
| 17 | Belgium | Europe | 827941.000000 | 10322890.000000 | 22650.000000 | 0.327000 | 1323086.000000 | 419430.000000 | 8.020438 |
| 18 | Belize | North America | 12400.000000 | 0.000000 | 316.000000 | 0.000000 | 15006.000000 | 0.000000 | nan |
| 19 | Benin | Africa | 6818.000000 | 0.000000 | 90.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 20 | Bermuda | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 30481.000000 | 11674.000000 | nan |
| 21 | Bhutan | Asia | 869.000000 | 570591.000000 | 1.000000 | 0.012000 | 0.000000 | 0.000000 | 0.152298 |
| 22 | Bolivia | South America | 264411.000000 | 815270.000000 | 12051.000000 | 0.636000 | 164984.000000 | 16939.000000 | 32.432323 |
| 23 | Bosnia and Herzegovina | Europe | 151337.000000 | 733032.000000 | 5773.000000 | 0.742000 | 0.000000 | 0.000000 | 20.645347 |
| 24 | Botswana | Africa | 35493.000000 | 0.000000 | 458.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 25 | Brazil | South America | 11950459.000000 | 6421441.000000 | 292752.000000 | 0.000000 | 13479165.000000 | 3380095.000000 | 186.102450 |
| 26 | Brunei | Asia | 205.000000 | 0.000000 | 3.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 27 | Bulgaria | Europe | 302480.000000 | 1915561.000000 | 11966.000000 | 0.406000 | 366547.000000 | 70753.000000 | 15.790674 |
| 28 | Burkina Faso | Africa | 12516.000000 | 0.000000 | 145.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 29 | Burundi | Africa | 2563.000000 | 0.000000 | 3.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 30 | Cambodia | Asia | 1578.000000 | 0.000000 | 2.000000 | 0.000000 | 170659.000000 | 0.000000 | nan |
| 31 | Cameroon | Africa | 40622.000000 | 0.000000 | 601.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 32 | Canada | North America | 935932.000000 | 26250445.000000 | 22635.000000 | 0.035000 | 3862685.000000 | 626214.000000 | 3.565395 |
| 33 | Cape Verde | Africa | 16440.000000 | 0.000000 | 159.000000 | 0.216000 | 0.000000 | 0.000000 | nan |
| 34 | Cayman Islands | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 39145.000000 | 12824.000000 | nan |
| 35 | Central African Republic | Africa | 5075.000000 | 0.000000 | 64.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 36 | Chad | Africa | 4410.000000 | 0.000000 | 157.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 37 | Chile | South America | 925089.000000 | 10343273.000000 | 22180.000000 | 0.364000 | 8464110.000000 | 2867453.000000 | 8.943871 |
| 38 | China | Asia | 101518.000000 | 160000000.000000 | 4839.000000 | 0.000000 | 70000000.000000 | 0.000000 | 0.063449 |
| 39 | Colombia | South America | 2331187.000000 | 12059588.000000 | 61907.000000 | 0.332000 | 1131999.000000 | 54162.000000 | 19.330569 |
| 40 | Comoros | Africa | 3666.000000 | 0.000000 | 146.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 41 | Congo | Africa | 9564.000000 | 0.000000 | 134.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 42 | Costa Rica | North America | 211903.000000 | 623493.000000 | 2896.000000 | 0.426000 | 248082.000000 | 57994.000000 | 33.986428 |
| 43 | Cote d'Ivoire | Africa | 39913.000000 | 470459.000000 | 217.000000 | 0.295000 | 22443.000000 | 0.000000 | 8.483842 |
| 44 | Croatia | Europe | 256805.000000 | 1461537.000000 | 5753.000000 | 0.363000 | 365082.000000 | 76460.000000 | 17.570886 |
| 45 | Cuba | North America | 65962.000000 | 2730305.000000 | 392.000000 | 0.077000 | 0.000000 | 0.000000 | 2.415921 |
| 46 | Cyprus | Europe | 41882.000000 | 3044737.000000 | 242.000000 | 0.060000 | 129438.000000 | 35963.000000 | 1.375554 |
| 47 | Czechia | Europe | 1459406.000000 | 0.000000 | 24530.000000 | 0.321000 | 1330675.000000 | 357083.000000 | nan |
| 48 | Democratic Republic of Congo | Africa | 27468.000000 | 0.000000 | 726.000000 | 0.452000 | 0.000000 | 0.000000 | nan |
| 49 | Denmark | Europe | 225540.000000 | 19408441.000000 | 2400.000000 | 0.195000 | 935185.000000 | 306288.000000 | 1.162072 |
| 50 | Djibouti | Africa | 6518.000000 | 0.000000 | 63.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 51 | Dominica | North America | 156.000000 | 0.000000 | 0.000000 | 0.000000 | 13565.000000 | 0.000000 | nan |
| 52 | Dominican Republic | North America | 248989.000000 | 1263135.000000 | 3269.000000 | 0.404000 | 675000.000000 | 0.000000 | 19.711986 |
| 53 | Ecuador | South America | 310868.000000 | 1023967.000000 | 16435.000000 | 0.407000 | 141191.000000 | 20137.000000 | 30.359181 |
| 54 | Egypt | Africa | 194771.000000 | 0.000000 | 11557.000000 | 0.000000 | 1315.000000 | 0.000000 | nan |
| 55 | El Salvador | North America | 62531.000000 | 717882.000000 | 1975.000000 | 0.175000 | 41512.000000 | 0.000000 | 8.710484 |
| 56 | Equatorial Guinea | Africa | 6736.000000 | 0.000000 | 100.000000 | 0.000000 | 6565.000000 | 800.000000 | nan |
| 57 | Eritrea | Africa | 3118.000000 | 0.000000 | 7.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 58 | Estonia | Europe | 94028.000000 | 1071562.000000 | 780.000000 | 0.203000 | 220882.000000 | 56946.000000 | 8.774854 |
| 59 | Eswatini | Africa | 17283.000000 | 0.000000 | 665.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 60 | Ethiopia | Africa | 185641.000000 | 2256439.000000 | 2647.000000 | 0.216000 | 0.000000 | 0.000000 | 8.227167 |
| 61 | Faeroe Islands | Europe | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 9342.000000 | 4033.000000 | nan |
| 62 | Falkland Islands | South America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1732.000000 | 0.000000 | nan |
| 63 | Fiji | Oceania | 67.000000 | 34687.000000 | 2.000000 | 0.381000 | 0.000000 | 0.000000 | 0.193156 |
| 64 | Finland | Europe | 71123.000000 | 3734426.000000 | 805.000000 | 0.118000 | 812039.000000 | 87515.000000 | 1.904523 |
| 65 | France | Europe | 4277183.000000 | 0.000000 | 92119.000000 | 0.159000 | 7927771.000000 | 2297100.000000 | nan |
| 66 | Gabon | Africa | 17711.000000 | 0.000000 | 106.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 67 | Gambia | Africa | 5153.000000 | 52362.000000 | 160.000000 | 0.510000 | 0.000000 | 0.000000 | 9.841106 |
| 68 | Georgia | Asia | 277218.000000 | 0.000000 | 3691.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 69 | Germany | Europe | 2669233.000000 | 47511887.000000 | 74706.000000 | 0.154000 | 10473852.000000 | 3245985.000000 | 5.618032 |
| 70 | Ghana | Africa | 89276.000000 | 953041.000000 | 716.000000 | 0.319000 | 420000.000000 | 0.000000 | 9.367488 |
| 71 | Gibraltar | Europe | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 52331.000000 | 22293.000000 | nan |
| 72 | Greece | Europe | 235611.000000 | 5944445.000000 | 7421.000000 | 0.105000 | 1436491.000000 | 459446.000000 | 3.963549 |
| 73 | Greenland | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5130.000000 | 1203.000000 | nan |
| 74 | Grenada | North America | 154.000000 | 0.000000 | 1.000000 | 0.000000 | 8606.000000 | 0.000000 | nan |
| 75 | Guatemala | North America | 187659.000000 | 995537.000000 | 6685.000000 | 0.513000 | 66399.000000 | 0.000000 | 18.850028 |
| 76 | Guernsey | Europe | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 29383.000000 | 7883.000000 | nan |
| 77 | Guinea | Africa | 18562.000000 | 0.000000 | 108.000000 | 0.000000 | 25263.000000 | 0.000000 | nan |
| 78 | Guinea-Bissau | Africa | 3558.000000 | 0.000000 | 55.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 79 | Guyana | South America | 9585.000000 | 0.000000 | 214.000000 | 0.000000 | 15524.000000 | 0.000000 | nan |
| 80 | Haiti | North America | 12700.000000 | 0.000000 | 251.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 81 | Honduras | North America | 181931.000000 | 0.000000 | 4430.000000 | 0.000000 | 37317.000000 | 0.000000 | nan |
| 82 | Hong Kong | Asia | 0.000000 | 9918161.000000 | 0.000000 | 0.000000 | 330600.000000 | 0.000000 | 0.000000 |
| 83 | Hungary | Europe | 560971.000000 | 3751450.000000 | 18068.000000 | 0.354000 | 2038133.000000 | 474891.000000 | 14.953445 |
| 84 | Iceland | Europe | 6097.000000 | 290437.000000 | 29.000000 | 0.132000 | 52604.000000 | 14739.000000 | 2.099250 |
| 85 | India | Asia | 11599130.000000 | 231370546.000000 | 159755.000000 | 0.127000 | 44603841.000000 | 7478654.000000 | 5.013227 |
| 86 | Indonesia | Asia | 1455788.000000 | 7781193.000000 | 39447.000000 | 0.424000 | 7835357.000000 | 2301978.000000 | 18.709059 |
| 87 | Iran | Asia | 1793805.000000 | 11844528.000000 | 61724.000000 | 0.326000 | 10000.000000 | 0.000000 | 15.144588 |
| 88 | Iraq | Asia | 789390.000000 | 7498360.000000 | 13969.000000 | 0.215000 | 0.000000 | 0.000000 | 10.527502 |
| 89 | Ireland | Europe | 229831.000000 | 3786972.000000 | 4585.000000 | 0.255000 | 639586.000000 | 171258.000000 | 6.068991 |
| 90 | Isle of Man | Europe | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 33336.000000 | 11340.000000 | nan |
| 91 | Israel | Asia | 827220.000000 | 14437280.000000 | 6082.000000 | 0.130000 | 9686464.000000 | 4523828.000000 | 5.729750 |
| 92 | Italy | Europe | 3356331.000000 | 45894515.000000 | 104642.000000 | 0.268000 | 7708889.000000 | 2443394.000000 | 7.313142 |
| 93 | Jamaica | North America | 34665.000000 | 243363.000000 | 524.000000 | 0.290000 | 16096.000000 | 0.000000 | 14.244154 |
| 94 | Japan | Asia | 455212.000000 | 8633325.000000 | 8802.000000 | 0.204000 | 578835.000000 | 25381.000000 | 5.272731 |
| 95 | Jersey | Europe | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 45758.000000 | 5621.000000 | nan |
| 96 | Jordan | Asia | 526666.000000 | 5317747.000000 | 5788.000000 | 0.247000 | 241868.000000 | 52412.000000 | 9.903931 |
| 97 | Kazakhstan | Asia | 281798.000000 | 8211056.000000 | 3201.000000 | 0.228000 | 109995.000000 | 19247.000000 | 3.431934 |
| 98 | Kenya | Africa | 120163.000000 | 1282799.000000 | 1994.000000 | 0.192000 | 20000.000000 | 0.000000 | 9.367251 |
| 99 | Kosovo | Europe | 80295.000000 | 0.000000 | 1752.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 100 | Kuwait | Asia | 217933.000000 | 1941949.000000 | 1215.000000 | 0.263000 | 360000.000000 | 38000.000000 | 11.222385 |
| 101 | Kyrgyzstan | Asia | 87389.000000 | 0.000000 | 1498.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 102 | Laos | Asia | 49.000000 | 0.000000 | 0.000000 | 0.000000 | 40732.000000 | 0.000000 | nan |
| 103 | Latvia | Europe | 97149.000000 | 1671987.000000 | 1821.000000 | 0.119000 | 102320.000000 | 18936.000000 | 5.810392 |
| 104 | Lebanon | Asia | 436575.000000 | 0.000000 | 5715.000000 | 0.000000 | 135349.000000 | 42752.000000 | nan |
| 105 | Lesotho | Africa | 10535.000000 | 0.000000 | 309.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 106 | Liberia | Africa | 2042.000000 | 0.000000 | 85.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 107 | Libya | Africa | 150341.000000 | 0.000000 | 2487.000000 | 0.268000 | 0.000000 | 0.000000 | nan |
| 108 | Liechtenstein | Europe | 2627.000000 | 0.000000 | 56.000000 | 0.000000 | 4215.000000 | 0.000000 | nan |
| 109 | Lithuania | Europe | 208650.000000 | 2264247.000000 | 3464.000000 | 0.254000 | 399863.000000 | 124481.000000 | 9.214984 |
| 110 | Luxembourg | Europe | 58955.000000 | 2280826.000000 | 714.000000 | 0.216000 | 70339.000000 | 17469.000000 | 2.584809 |
| 111 | Macao | Asia | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 27637.000000 | 0.000000 | nan |
| 112 | Madagascar | Africa | 22275.000000 | 132187.000000 | 345.000000 | 0.547000 | 0.000000 | 0.000000 | 16.851128 |
| 113 | Malawi | Africa | 33216.000000 | 210730.000000 | 1093.000000 | 0.377000 | 15326.000000 | 0.000000 | 15.762350 |
| 114 | Malaysia | Asia | 331713.000000 | 7029970.000000 | 1229.000000 | 0.163000 | 399525.000000 | 0.000000 | 4.718555 |
| 115 | Maldives | Asia | 22373.000000 | 588650.000000 | 65.000000 | 0.142000 | 212711.000000 | 0.000000 | 3.800730 |
| 116 | Mali | Africa | 9270.000000 | 0.000000 | 367.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 117 | Malta | Europe | 27904.000000 | 772206.000000 | 369.000000 | 0.077000 | 140331.000000 | 43267.000000 | 3.613544 |
| 118 | Marshall Islands | Oceania | 4.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 119 | Mauritania | Africa | 17587.000000 | 233820.000000 | 446.000000 | 0.186000 | 0.000000 | 0.000000 | 7.521598 |
| 120 | Mauritius | Africa | 796.000000 | 0.000000 | 10.000000 | 0.000000 | 3843.000000 | 0.000000 | nan |
| 121 | Mexico | North America | 2193639.000000 | 5399163.000000 | 197827.000000 | 0.531000 | 5459014.000000 | 695667.000000 | 40.629242 |
| 122 | Micronesia (country) | Oceania | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 123 | Moldova | Europe | 214203.000000 | 0.000000 | 4531.000000 | 0.000000 | 18593.000000 | 0.000000 | nan |
| 124 | Monaco | Europe | 2173.000000 | 0.000000 | 27.000000 | 0.000000 | 18081.000000 | 8331.000000 | nan |
| 125 | Mongolia | Asia | 4806.000000 | 2061669.000000 | 4.000000 | 0.017000 | 204121.000000 | 0.000000 | 0.233112 |
| 126 | Montenegro | Europe | 86782.000000 | 0.000000 | 1194.000000 | 0.000000 | 7298.000000 | 461.000000 | nan |
| 127 | Montserrat | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 932.000000 | 40.000000 | nan |
| 128 | Morocco | Africa | 491463.000000 | 5347174.000000 | 8763.000000 | 0.265000 | 6687548.000000 | 2423380.000000 | 9.191079 |
| 129 | Mozambique | Africa | 66064.000000 | 458121.000000 | 743.000000 | 0.336000 | 46439.000000 | 0.000000 | 14.420644 |
| 130 | Myanmar | Asia | 142212.000000 | 2482290.000000 | 3204.000000 | 0.218000 | 380000.000000 | 0.000000 | 5.729065 |
| 131 | Namibia | Africa | 42203.000000 | 327300.000000 | 492.000000 | 0.257000 | 0.000000 | 0.000000 | 12.894287 |
| 132 | Nepal | Asia | 275829.000000 | 2218722.000000 | 3016.000000 | 0.252000 | 1600000.000000 | 0.000000 | 12.431886 |
| 133 | Netherlands | Europe | 1211447.000000 | 7184008.000000 | 16395.000000 | 0.292000 | 1887726.000000 | 493123.000000 | 16.863108 |
| 134 | New Zealand | Oceania | 2453.000000 | 1840473.000000 | 26.000000 | 0.040000 | 27000.000000 | 0.000000 | 0.133281 |
| 135 | Nicaragua | North America | 6582.000000 | 0.000000 | 176.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 136 | Niger | Africa | 4918.000000 | 0.000000 | 185.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 137 | Nigeria | Africa | 161651.000000 | 1684305.000000 | 2030.000000 | 0.298000 | 8000.000000 | 0.000000 | 9.597490 |
| 138 | North Macedonia | Europe | 118736.000000 | 578312.000000 | 3448.000000 | 0.398000 | 5300.000000 | 0.000000 | 20.531478 |
| 139 | Northern Cyprus | Asia | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 11000.000000 | 0.000000 | nan |
| 140 | Norway | Europe | 86362.000000 | 4274069.000000 | 648.000000 | 0.095000 | 758514.000000 | 261503.000000 | 2.020604 |
| 141 | Oman | Asia | 149135.000000 | 0.000000 | 1620.000000 | 0.361000 | 109844.000000 | 19019.000000 | nan |
| 142 | Pakistan | Asia | 626802.000000 | 9691087.000000 | 13843.000000 | 0.256000 | 350000.000000 | 0.000000 | 6.467819 |
| 143 | Palestine | Asia | 221391.000000 | 0.000000 | 2406.000000 | 0.317000 | 0.000000 | 0.000000 | nan |
| 144 | Panama | North America | 350665.000000 | 2035210.000000 | 6042.000000 | 0.414000 | 297165.000000 | 0.000000 | 17.229917 |
| 145 | Papua New Guinea | Oceania | 3085.000000 | 0.000000 | 36.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 146 | Paraguay | South America | 192599.000000 | 821942.000000 | 3695.000000 | 0.446000 | 14696.000000 | 0.000000 | 23.432189 |
| 147 | Peru | South America | 1451645.000000 | 4395068.000000 | 49897.000000 | 0.377000 | 623800.000000 | 187613.000000 | 33.028954 |
| 148 | Philippines | Asia | 656056.000000 | 8938938.000000 | 12930.000000 | 0.149000 | 240297.000000 | 0.000000 | 7.339306 |
| 149 | Poland | Europe | 2036700.000000 | 10639405.000000 | 49159.000000 | 0.503000 | 4983494.000000 | 1769770.000000 | 19.142988 |
| 150 | Portugal | Europe | 817080.000000 | 8671839.000000 | 16762.000000 | 0.204000 | 1325266.000000 | 432894.000000 | 9.422223 |
| 151 | Qatar | Asia | 173206.000000 | 1648555.000000 | 272.000000 | 0.393000 | 510000.000000 | 0.000000 | 10.506535 |
| 152 | Romania | Europe | 892848.000000 | 6445769.000000 | 22132.000000 | 0.298000 | 2426191.000000 | 768921.000000 | 13.851691 |
| 153 | Russia | Europe | 4397816.000000 | 116724405.000000 | 93090.000000 | 0.097000 | 8306498.000000 | 2710605.000000 | 3.767692 |
| 154 | Rwanda | Africa | 20761.000000 | 1077055.000000 | 287.000000 | 0.060000 | 329410.000000 | 0.000000 | 1.927571 |
| 155 | Saint Helena | Africa | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 3107.000000 | 0.000000 | nan |
| 156 | Saint Kitts and Nevis | North America | 44.000000 | 0.000000 | 0.000000 | 0.000000 | 7580.000000 | 0.000000 | nan |
| 157 | Saint Lucia | North America | 4113.000000 | 0.000000 | 55.000000 | 0.000000 | 20247.000000 | 0.000000 | nan |
| 158 | Saint Vincent and the Grenadines | North America | 1696.000000 | 0.000000 | 9.000000 | 0.000000 | 9383.000000 | 0.000000 | nan |
| 159 | Samoa | Oceania | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 160 | San Marino | Europe | 4356.000000 | 0.000000 | 79.000000 | 0.000000 | 7923.000000 | 35.000000 | nan |
| 161 | Sao Tome and Principe | Africa | 2142.000000 | 0.000000 | 34.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 162 | Saudi Arabia | Asia | 384653.000000 | 14503622.000000 | 6602.000000 | 0.194000 | 2999798.000000 | 0.000000 | 2.652117 |
| 163 | Senegal | Africa | 37693.000000 | 430894.000000 | 1007.000000 | 0.571000 | 150857.000000 | 0.000000 | 8.747627 |
| 164 | Serbia | Europe | 546896.000000 | 3218400.000000 | 4900.000000 | 0.442000 | 2163593.000000 | 858461.000000 | 16.992791 |
| 165 | Seychelles | Africa | 3770.000000 | 0.000000 | 16.000000 | 0.000000 | 90150.000000 | 27693.000000 | nan |
| 166 | Sierra Leone | Africa | 3948.000000 | 0.000000 | 79.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 167 | Singapore | Asia | 60184.000000 | 8055714.000000 | 30.000000 | 0.300000 | 792423.000000 | 243169.000000 | 0.747097 |
| 168 | Slovakia | Europe | 347944.000000 | 21061465.000000 | 8978.000000 | 0.178000 | 718369.000000 | 229980.000000 | 1.652041 |
| 169 | Slovenia | Europe | 205509.000000 | 2313303.000000 | 3967.000000 | 0.304000 | 286151.000000 | 103865.000000 | 8.883791 |
| 170 | Solomon Islands | Oceania | 18.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 171 | Somalia | Africa | 9968.000000 | 0.000000 | 419.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 172 | South Africa | Africa | 1536801.000000 | 9556404.000000 | 52082.000000 | 0.326000 | 182983.000000 | 182983.000000 | 16.081373 |
| 173 | South Korea | Asia | 98665.000000 | 7176600.000000 | 1696.000000 | 0.049000 | 676900.000000 | 313.000000 | 1.374815 |
| 174 | South Sudan | Africa | 9849.000000 | 124125.000000 | 106.000000 | 0.242000 | 0.000000 | 0.000000 | 7.934743 |
| 175 | Spain | Europe | 3212332.000000 | 34785710.000000 | 72910.000000 | 0.429000 | 5993363.000000 | 1886813.000000 | 9.234631 |
| 176 | Sri Lanka | Asia | 89655.000000 | 2309954.000000 | 544.000000 | 0.096000 | 824523.000000 | 0.000000 | 3.881246 |
| 177 | Sudan | Africa | 30989.000000 | 0.000000 | 1959.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 178 | Suriname | South America | 9061.000000 | 0.000000 | 176.000000 | 0.000000 | 11879.000000 | 0.000000 | nan |
| 179 | Sweden | Europe | 744272.000000 | 0.000000 | 13262.000000 | 0.247000 | 1293923.000000 | 383498.000000 | nan |
| 180 | Switzerland | Europe | 580609.000000 | 4699813.000000 | 10203.000000 | 0.270000 | 1176875.000000 | 432194.000000 | 12.353875 |
| 181 | Syria | Asia | 17240.000000 | 0.000000 | 1153.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 182 | Taiwan | Asia | 1005.000000 | 183386.000000 | 10.000000 | 0.028000 | 0.000000 | 0.000000 | 0.548024 |
| 183 | Tajikistan | Asia | 13308.000000 | 0.000000 | 90.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 184 | Tanzania | Africa | 509.000000 | 0.000000 | 21.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 185 | Thailand | Asia | 27594.000000 | 2894666.000000 | 90.000000 | 0.286000 | 53842.000000 | 0.000000 | 0.953271 |
| 186 | Timor | Asia | 271.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 187 | Togo | Africa | 8839.000000 | 258894.000000 | 102.000000 | 0.134000 | 0.000000 | 0.000000 | 3.414139 |
| 188 | Trinidad and Tobago | North America | 7839.000000 | 103786.000000 | 140.000000 | 0.496000 | 991.000000 | 0.000000 | 7.553042 |
| 189 | Tunisia | Africa | 245405.000000 | 225033.000000 | 8526.000000 | 0.341000 | 6861.000000 | 0.000000 | 109.052894 |
| 190 | Turkey | Asia | 2992694.000000 | 35787480.000000 | 29959.000000 | 0.191000 | 13029754.000000 | 5013676.000000 | 8.362405 |
| 191 | Turks and Caicos Islands | North America | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6433.000000 | 0.000000 | nan |
| 192 | Uganda | Africa | 40651.000000 | 910045.000000 | 334.000000 | 0.162000 | 13027.000000 | 0.000000 | 4.466922 |
| 193 | Ukraine | Europe | 1584972.000000 | 7573737.000000 | 31344.000000 | 0.504000 | 108310.000000 | 1.000000 | 20.927212 |
| 194 | United Arab Emirates | Asia | 438638.000000 | 34913667.000000 | 1433.000000 | 0.024000 | 7181056.000000 | 2187849.000000 | 1.256350 |
| 195 | United Kingdom | Europe | 4304839.000000 | 107584947.000000 | 126359.000000 | 0.300000 | 28985958.000000 | 2132551.000000 | 4.001340 |
| 196 | United States | North America | 29785285.000000 | 355058178.000000 | 541926.000000 | 0.202000 | 121441497.000000 | 43036818.000000 | 8.388846 |
| 197 | Uruguay | South America | 79923.000000 | 1174028.000000 | 776.000000 | 0.129000 | 311282.000000 | 0.000000 | 6.807589 |
| 198 | Uzbekistan | Asia | 81339.000000 | 0.000000 | 622.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 199 | Vanuatu | Oceania | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 200 | Vatican | Europe | 27.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 201 | Venezuela | South America | 150306.000000 | 0.000000 | 1483.000000 | 0.000000 | 12194.000000 | 0.000000 | nan |
| 202 | Vietnam | Asia | 2572.000000 | 1469955.000000 | 35.000000 | 0.016000 | 30971.000000 | 0.000000 | 0.174971 |
| 203 | Yemen | Asia | 3278.000000 | 0.000000 | 737.000000 | 0.000000 | 0.000000 | 0.000000 | nan |
| 204 | Zambia | Africa | 86273.000000 | 1180598.000000 | 1178.000000 | 0.301000 | 0.000000 | 0.000000 | 7.307568 |
| 205 | Zimbabwe | Africa | 36662.000000 | 412342.000000 | 1510.000000 | 0.287000 | 42210.000000 | 0.000000 | 8.891163 |
You can click each country and see the number representing the spread of the virus.
fig = px.choropleth(covid_df, locations="location",
color=np.log(covid_df["total_cases"]),
locationmode="country names", hover_name="location",
animation_frame=covid_df["date"],
title='Cases over time', color_continuous_scale=px.colors.sequential.matter)
#fig.update(layout_coloraxis_showscale=False)
fig.show()
def plot_map(df, col, pal):
fig = px.choropleth(df, locations="location", locationmode='country names',
color=col, hover_name="location",
title=col, hover_data=[col], color_continuous_scale=pal)
# fig.update_layout(coloraxis_showscale=False)
fig.show()
covid_deaths=covid_df[["continent","location","total_cases","date","total_deaths","total_deaths_per_million","total_cases_per_million","total_vaccinations"]]
df=covid_deaths.dropna(axis=0)
df_data=df.groupby(['location']).max()
df_data.drop(["date"],axis=1,inplace=True)
df_data.reset_index(inplace=True)
#df_data.drop(index=171,inplace=True)
df_data
df_data[df_data["continent"]=="Africa"].sum()
location AlgeriaAngolaBeninBotswanaBurkina FasoBurundiC... continent AfricaAfricaAfricaAfricaAfricaAfricaAfricaAfri... total_cases 4.09742e+06 total_deaths 109674 total_deaths_per_million 5353.83 total_cases_per_million 287182 total_vaccinations 8.19935e+06 dtype: object
For africa regions, the confirmed cases is lower than other continents, I guess this is due to the fact that number of tests is quite low.
You can click each country and see the number of the total confirmed cases.
plot_map(df_data,'total_cases', 'matter')
We can see US,Brazil and India are distinctive
You can click each country and see the number of the total deaths.
plot_map(df_data,'total_deaths', 'matter')
We can see US,Brazil,Mexico and India are distinctive
You can click each country and see the number of the total deaths per million
plot_map(df_data,'total_deaths_per_million', 'matter')
def plot_hbar(df, col, n, hover_data=[]):
fig = px.bar(df.sort_values(col).tail(n),
x=col, y="location", color='continent',
text=col, orientation='h', width=700, hover_data=hover_data,
color_discrete_sequence = px.colors.qualitative.Dark2)
fig.update_layout(title=col, xaxis_title="", yaxis_title="",
yaxis_categoryorder = 'total ascending',
uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()
plot_hbar(df_data, 'total_cases', 15)
plot_hbar(df_data, 'total_deaths', 15)
plot_hbar(df_data, 'total_deaths_per_million', 15)
plot_hbar(df_covid, "total_tests", 15)
plot_hbar(df_covid,"total_vaccinations", 15)
plot_hbar(df_covid,"people_fully_vaccinated", 15)
We used this technique of data visulizing to display hierarchical data using nested rectangles,And accurately display multiple elements together
def plot_treemap(col):
fig = px.treemap(df_data, path=["location"], values=col, height=700,
title=col, color_discrete_sequence = px.colors.qualitative.Dark2)
fig.data[0].textinfo = 'label+text+value'
fig.show()
def plot_treemap_(col):
fig = px.treemap(df_covid, path=["location"], values=col, height=700,
title=col, color_discrete_sequence = px.colors.qualitative.Dark2)
fig.data[0].textinfo = 'label+text+value'
fig.show()
plot_treemap('total_cases')
plot_treemap('total_deaths')
plot_treemap_('total_tests')
plot_treemap_('test per confirmed(%)')
plot_treemap_('total_vaccinations')
plot_treemap_('people_fully_vaccinated')
covid_df['death_rate'] = (covid_df['new_deaths_smoothed_per_million'] / covid_df['new_cases_smoothed_per_million']).replace(np.inf,np.nan)
covid_df['population_coverage'] = covid_df['total_tests'] / covid_df['population']
trace1 = go.Scatter(
x=covid_df.groupby(['date'])['date'].apply(lambda x: np.unique(x)[0]),
y=covid_df.groupby(['date'])['new_deaths_smoothed_per_million'].mean(),
xaxis='x2',
yaxis='y2',
name = "mean new deaths smoothed per million"
)
trace2 = go.Scatter(
x=covid_df.groupby(['date'])['date'].apply(lambda x: np.unique(x)[0]),
y=covid_df.groupby(['date'])['new_tests_smoothed_per_thousand'].mean(),
name = "mean new tests smoothed per thousand"
)
trace3 = go.Scatter(
x=covid_df.groupby(['date'])['date'].apply(lambda x: np.unique(x)[0]),
y=(covid_df.groupby(['date'])['death_rate'].mean().replace([np.inf],np.nan).interpolate(method='linear', limit_direction='forward', axis=0) * 100).round(3),
xaxis='x3',
yaxis='y3',
name = "interpolated death rate %"
)
trace4 = go.Scatter(
x=covid_df.groupby(['date'])['date'].apply(lambda x: np.unique(x)[0]),
y=((covid_df.groupby(['date'])['new_cases_per_million'].apply(lambda x: np.mean(x/1e+6))) * 100).round(6),
xaxis='x4',
yaxis='y4',
name = "mean covid population d2d coverage %"
)
data = [trace1, trace2, trace3, trace4]
layout = go.Layout(
xaxis=dict(
domain=[0, 0.45]
),
yaxis=dict(
domain=[0, 0.45]
),
xaxis2=dict(
domain=[0.55, 1]
),
xaxis3=dict(
domain=[0, 0.45],
anchor='y3'
),
xaxis4=dict(
domain=[0.55, 1],
anchor='y4'
),
yaxis2=dict(
domain=[0, 0.45],
anchor='x2'
),
yaxis3=dict(
domain=[0.55, 1]
),
yaxis4=dict(
domain=[0.55, 1],
anchor='x4'
),
title = 'Mean new deaths per 1M, new tests per 1K, death rate and covid mean coverage'
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)
We used this technique of data visualization to plot line display day by day trend ,And accurately display multiple elements together
def plot_line(col,title):
trace1 = go.Scatter(
x = covid_df[(covid_df['continent']=='Asia')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Asia')].groupby(['date','continent'])[col].sum(),
mode = "lines",
name = "Asia",
marker = dict(color = 'green'),
)
trace2 = go.Scatter(
x = covid_df[(covid_df['continent']=='Europe')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Europe')].groupby(['date','continent'])[col].sum(),
mode = "lines",
name = "Europe",
marker = dict(color = 'red'),
)
trace3 = go.Scatter(
x = covid_df[(covid_df['continent']=='Africa')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Africa')].groupby(['date','continent'])[col].sum(),
mode = "lines",
name = "Africa",
marker = dict(color = 'blue'),
#text= covid_df.university_name
)
trace4 = go.Scatter(
x = covid_df[(covid_df['continent']=='North America')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='North America')].groupby(['date','continent'])[col].sum(),
mode = "lines",
name = "North America",
marker = dict(color = 'black'),
)
trace5 = go.Scatter(
x = covid_df[(covid_df['continent']=='South America')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='South America')].groupby(['date','continent'])[col].sum(),
mode = "lines",
name = "South America",
marker = dict(color = 'brown'),
)
data = [trace1,trace2,trace3,trace4,trace5]
layout = dict(title = title,
xaxis= dict(title= "#{} day by day".format(title),ticklen= 5,zeroline= False)
)
fig = dict(data = data, layout = layout)
iplot(fig)
plot_line('new_deaths_smoothed','New Deaths Smoothed')
plot_line('new_vaccinations_smoothed','new vaccinations smoothed')
plot_line('total_vaccinations','total_vaccinations')
plot_line('new_tests_smoothed','New tests smoothed')
plot_line('positive_rate','positive_rate')
def plot_line_mean(col,title):
trace1 = go.Scatter(
x = covid_df[(covid_df['continent']=='Asia')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Asia')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])[col].mean()*100,
mode = "lines",
name = "Asia",
marker = dict(color = 'green'),
)
trace2 = go.Scatter(
x = covid_df[(covid_df['continent']=='Europe')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Europe')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])[col].mean()*100,
mode = "lines",
name = "Europe",
marker = dict(color = 'red'),
)
trace3 = go.Scatter(
x = covid_df[(covid_df['continent']=='Africa')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Africa')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])[col].mean()*100,
mode = "lines",
name = "Africa",
marker = dict(color = 'blue'),
)
trace4 = go.Scatter(
x = covid_df[(covid_df['continent']=='North America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='North America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])[col].mean()*100,
mode = "lines",
name = "North America",
marker = dict(color = 'black'),
)
trace5 = go.Scatter(
x = covid_df[(covid_df['continent']=='South America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='South America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])[col].mean(),
mode = "lines",
name = "South America",
marker = dict(color = 'brown'),
)
data = [trace1,trace2,trace3,trace4,trace5]
layout = dict(title = title,
xaxis= dict(title= 'mean deaths/cases %',ticklen= 5,zeroline= False)
)
fig = dict(data = data, layout = layout)
iplot(fig)
plot_line_mean('death_rate','Mean death rate over continents')
plot_line_mean('population_coverage','Mean population test coverage over continents')
trace1 = go.Scatter(
x = covid_df[(covid_df['continent']=='Asia')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Asia')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['death_rate'].mean()*100,
mode = "lines",
name = "Asia",
marker = dict(color = 'green'),
)
trace2 = go.Scatter(
x = covid_df[(covid_df['continent']=='Europe')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Europe')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['population_coverage'].mean()*100,
mode = "lines",
name = "Europe",
marker = dict(color = 'red'),
)
trace3 = go.Scatter(
x = covid_df[(covid_df['continent']=='Africa')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='Africa')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['population_coverage'].mean()*100,
mode = "lines",
name = "Africa",
marker = dict(color = 'blue'),
)
trace4 = go.Scatter(
x = covid_df[(covid_df['continent']=='North America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='North America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['population_coverage'].mean()*100,
mode = "lines",
name = "North America",
marker = dict(color = 'black'),
)
trace5 = go.Scatter(
x = covid_df[(covid_df['continent']=='South America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['date'].apply(lambda x: np.unique(x)[0]),
y = covid_df[(covid_df['continent']=='South America')&(covid_df['date']>='2020-03-01')].groupby(['date','continent'])['population_coverage'].mean(),
mode = "lines",
name = "South America",
marker = dict(color = 'brown'),
)
data = [trace1,trace2,trace3,trace4,trace5]
layout = dict(title = 'Mean population test coverage over continents',
xaxis= dict(title= 'mean tests/population %',ticklen= 5,zeroline= False)
)
fig = dict(data = data, layout = layout)
iplot(fig)
covid_df_grouped = covid_df.groupby(['location','continent']).agg({'new_deaths': np.sum, 'gdp_per_capita': np.mean, 'new_cases':np.sum}).reset_index()
covid_df_grouped = covid_df_grouped[(~covid_df_grouped['new_deaths'].isnull())&(~covid_df_grouped['new_cases'].isnull())&(~covid_df_grouped['gdp_per_capita'].isnull())&(~covid_df_grouped['continent'].isnull())]
fig = px.scatter(covid_df_grouped,
x="new_deaths", y="gdp_per_capita", size="new_cases", color="continent",
hover_name="location", log_x=True, size_max=60)
fig.show()
covid_df_grouped = covid_df.groupby(['location','continent']).agg({'handwashing_facilities': np.mean, 'new_deaths_smoothed_per_million': np.sum, 'extreme_poverty':np.mean}).reset_index()
covid_df_grouped = covid_df_grouped[(~covid_df_grouped['handwashing_facilities'].isnull())&(~covid_df_grouped['new_deaths_smoothed_per_million'].isnull())&(~covid_df_grouped['extreme_poverty'].isnull())&(~covid_df_grouped['continent'].isnull())]
fig = px.scatter(covid_df_grouped,
x="new_deaths_smoothed_per_million", y="handwashing_facilities", size="extreme_poverty", color="continent",
hover_name="location", log_x=True, size_max=60)
fig.show()
covid_df_grouped = covid_df.groupby(['location','continent']).agg({'population_density': np.mean, 'new_deaths_smoothed_per_million': np.sum, 'aged_70_older':np.mean}).reset_index()
covid_df_grouped = covid_df_grouped[(~covid_df_grouped['population_density'].isnull())&(~covid_df_grouped['new_deaths_smoothed_per_million'].isnull())&(~covid_df_grouped['aged_70_older'].isnull())&(~covid_df_grouped['continent'].isnull())]
fig = px.scatter(covid_df_grouped,
x="new_deaths_smoothed_per_million", y="aged_70_older", size="population_density", color="continent",
hover_name="location", log_x=True, size_max=60)
fig.show()
covid_df_grouped = covid_df.groupby(['location','continent']).agg({'life_expectancy': np.mean, 'new_deaths_smoothed_per_million': np.sum, 'hospital_beds_per_thousand':np.mean}).reset_index()
covid_df_grouped = covid_df_grouped[(~covid_df_grouped['life_expectancy'].isnull())&(~covid_df_grouped['new_deaths_smoothed_per_million'].isnull())&(~covid_df_grouped['hospital_beds_per_thousand'].isnull())&(~covid_df_grouped['continent'].isnull())]
fig = px.scatter(covid_df_grouped,
x="new_deaths_smoothed_per_million", y="life_expectancy", size="hospital_beds_per_thousand", color="continent",
hover_name="location", log_x=True, size_max=60)
fig.show()
covid_df_grouped = covid_df.groupby(['location','continent']).agg({'death_rate': np.mean, 'stringency_index': np.mean, 'new_cases':np.sum}).reset_index()
covid_df_grouped = covid_df_grouped[(~covid_df_grouped['death_rate'].isnull())&(~covid_df_grouped['stringency_index'].isnull())&(~covid_df_grouped['new_cases'].isnull())&(~covid_df_grouped['continent'].isnull())]
fig = px.scatter(covid_df_grouped,
x="death_rate", y="stringency_index", size="new_cases", color="continent",
hover_name="location", log_x=True, size_max=60)
fig.show()
covid_df_copy = world_covid19_df.copy()
covid_df_copy = covid_df.copy()
covid_df_copy.head(10)
| iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | death_rate | population_coverage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 5 | AFG | Asia | Afghanistan | 2020-02-29 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 |
| 6 | AFG | Asia | Afghanistan | 2020-03-01 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 |
| 7 | AFG | Asia | Afghanistan | 2020-03-02 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 |
| 8 | AFG | Asia | Afghanistan | 2020-03-03 | 2.0 | 1.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 |
| 9 | AFG | Asia | Afghanistan | 2020-03-04 | 4.0 | 2.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 597.029 | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 |
10 rows × 61 columns
correlations = covid_df_copy.corr()['total_cases'].abs().sort_values(ascending=False).drop('total_cases',axis=0).to_frame()
correlations.plot(kind='bar',figsize=(12,10));
# Function to see the correlation of each features
def corr(df):
"argument df tp get the correlation for"
return df.corr()
corr(covid_df_copy).style.background_gradient(cmap="CMRmap_r")
| total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | total_cases_per_million | new_cases_per_million | new_cases_smoothed_per_million | total_deaths_per_million | new_deaths_per_million | new_deaths_smoothed_per_million | reproduction_rate | icu_patients | icu_patients_per_million | hosp_patients | hosp_patients_per_million | weekly_icu_admissions | weekly_icu_admissions_per_million | weekly_hosp_admissions | weekly_hosp_admissions_per_million | new_tests | total_tests | total_tests_per_thousand | new_tests_per_thousand | new_tests_smoothed | new_tests_smoothed_per_thousand | positive_rate | tests_per_case | total_vaccinations | people_vaccinated | people_fully_vaccinated | new_vaccinations | new_vaccinations_smoothed | total_vaccinations_per_hundred | people_vaccinated_per_hundred | people_fully_vaccinated_per_hundred | new_vaccinations_smoothed_per_million | stringency_index | population | population_density | median_age | aged_65_older | aged_70_older | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | death_rate | population_coverage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| total_cases | 1.000000 | 0.874162 | 0.888154 | 0.983313 | 0.848058 | 0.876899 | 0.140310 | 0.067746 | 0.082189 | 0.169975 | 0.071839 | 0.095372 | -0.053677 | 0.167050 | 0.044366 | 0.155312 | 0.019911 | 0.003233 | -0.001386 | 0.060245 | 0.007456 | 0.176094 | 0.210690 | 0.008818 | -0.002867 | 0.184394 | -0.003935 | -0.031146 | -0.016797 | 0.709946 | 0.716938 | 0.637546 | 0.699287 | 0.766562 | 0.061655 | 0.069220 | 0.045063 | 0.048157 | -0.143389 | 0.603536 | -0.022085 | -0.076957 | -0.028515 | -0.028776 | -0.023861 | -0.030544 | -0.103823 | -0.060493 | -0.016033 | -0.028103 | 0.003269 | -0.042292 | -0.181257 | -0.101104 | -0.006046 | 0.008587 |
| new_cases | 0.874162 | 1.000000 | 0.988770 | 0.894524 | 0.926816 | 0.923409 | 0.113827 | 0.112570 | 0.120201 | 0.146790 | 0.095649 | 0.114138 | -0.051484 | 0.207915 | 0.064766 | 0.198959 | 0.036703 | 0.003244 | -0.002144 | 0.064484 | 0.008696 | 0.197580 | 0.172913 | 0.000945 | -0.001406 | 0.200244 | -0.003066 | -0.021606 | -0.019654 | 0.388810 | 0.397605 | 0.328983 | 0.422973 | 0.454275 | 0.025148 | 0.029821 | 0.015983 | 0.018978 | -0.158905 | 0.661523 | -0.025341 | -0.086924 | -0.028955 | -0.029078 | -0.025403 | -0.036756 | -0.119975 | -0.072198 | -0.014383 | -0.031940 | -0.000706 | -0.045197 | -0.207708 | -0.115344 | -0.007221 | 0.000681 |
| new_cases_smoothed | 0.888154 | 0.988770 | 1.000000 | 0.906475 | 0.918715 | 0.939357 | 0.117262 | 0.102323 | 0.122074 | 0.149869 | 0.092143 | 0.118265 | -0.053868 | 0.212243 | 0.065476 | 0.202595 | 0.036813 | 0.007140 | 0.000651 | 0.078322 | 0.013282 | 0.197794 | 0.178206 | 0.001922 | -0.001699 | 0.204380 | -0.002600 | -0.021468 | -0.019736 | 0.392009 | 0.401355 | 0.328814 | 0.425698 | 0.463595 | 0.026080 | 0.030955 | 0.016457 | 0.020343 | -0.159214 | 0.664703 | -0.025452 | -0.087217 | -0.029013 | -0.029132 | -0.025329 | -0.036887 | -0.120503 | -0.072394 | -0.014427 | -0.032033 | -0.000643 | -0.045448 | -0.208477 | -0.115709 | -0.007022 | 0.001657 |
| total_deaths | 0.983313 | 0.894524 | 0.906475 | 1.000000 | 0.881153 | 0.910655 | 0.132417 | 0.064787 | 0.078132 | 0.187885 | 0.077617 | 0.103195 | -0.059426 | 0.141831 | 0.037020 | 0.134571 | 0.017193 | 0.002716 | -0.003176 | 0.050593 | 0.005673 | 0.145829 | 0.163375 | -0.002184 | -0.008226 | 0.152508 | -0.010534 | -0.026620 | -0.018823 | 0.665703 | 0.677479 | 0.602474 | 0.653199 | 0.716121 | 0.053589 | 0.062107 | 0.038132 | 0.039971 | -0.160678 | 0.640373 | -0.025668 | -0.087175 | -0.030545 | -0.029817 | -0.028874 | -0.037296 | -0.123561 | -0.072404 | -0.017543 | -0.035624 | 0.004469 | -0.049499 | -0.205350 | -0.114825 | -0.001535 | -0.002447 |
| new_deaths | 0.848058 | 0.926816 | 0.918715 | 0.881153 | 1.000000 | 0.971499 | 0.106557 | 0.085865 | 0.096648 | 0.153542 | 0.140592 | 0.147183 | -0.058144 | 0.153560 | 0.054363 | 0.150057 | 0.033229 | 0.000592 | -0.003276 | 0.028982 | 0.002989 | 0.143001 | 0.127968 | -0.009556 | -0.008984 | 0.140609 | -0.013168 | -0.011568 | -0.021013 | 0.414428 | 0.429076 | 0.352333 | 0.456267 | 0.487689 | 0.027582 | 0.034202 | 0.017284 | 0.020900 | -0.166023 | 0.679298 | -0.027743 | -0.093805 | -0.032032 | -0.031190 | -0.031716 | -0.040879 | -0.131013 | -0.078984 | -0.016850 | -0.035987 | 0.003875 | -0.049041 | -0.222544 | -0.124281 | 0.008762 | -0.009840 |
| new_deaths_smoothed | 0.876899 | 0.923409 | 0.939357 | 0.910655 | 0.971499 | 1.000000 | 0.111334 | 0.079456 | 0.097269 | 0.160171 | 0.115682 | 0.151781 | -0.061976 | 0.156295 | 0.054274 | 0.152396 | 0.032897 | 0.006190 | -0.001283 | 0.056241 | 0.009894 | 0.139525 | 0.134119 | -0.008888 | -0.010318 | 0.145490 | -0.013247 | -0.012398 | -0.021468 | 0.434847 | 0.450076 | 0.370088 | 0.467670 | 0.511886 | 0.029955 | 0.036991 | 0.018918 | 0.023029 | -0.169794 | 0.695507 | -0.028383 | -0.095988 | -0.032732 | -0.031858 | -0.032321 | -0.041816 | -0.134150 | -0.080795 | -0.017246 | -0.036835 | 0.004048 | -0.050265 | -0.227720 | -0.127139 | 0.009072 | -0.009179 |
| total_cases_per_million | 0.140310 | 0.113827 | 0.117262 | 0.132417 | 0.106557 | 0.111334 | 1.000000 | 0.588534 | 0.706893 | 0.830203 | 0.437590 | 0.580378 | 0.113436 | 0.189610 | 0.379739 | 0.222291 | 0.379806 | 0.082057 | 0.112887 | 0.074514 | 0.094312 | 0.184743 | 0.195524 | 0.457347 | 0.306539 | 0.201747 | 0.387508 | 0.200742 | -0.042170 | 0.103211 | 0.123252 | 0.120889 | 0.094056 | 0.107851 | 0.250308 | 0.261211 | 0.199877 | 0.238136 | 0.047408 | -0.032864 | 0.027198 | 0.128905 | 0.191239 | 0.182706 | 0.263809 | -0.194549 | -0.162586 | 0.042081 | 0.290454 | 0.090835 | -0.119412 | 0.137358 | 0.125502 | 0.163102 | -0.026569 | 0.456945 |
| new_cases_per_million | 0.067746 | 0.112570 | 0.102323 | 0.064787 | 0.085865 | 0.079456 | 0.588534 | 1.000000 | 0.845264 | 0.479247 | 0.525415 | 0.573209 | 0.161570 | 0.162670 | 0.380001 | 0.204918 | 0.422492 | 0.050180 | 0.074757 | 0.053348 | 0.077163 | 0.137356 | 0.105801 | 0.279451 | 0.248827 | 0.145248 | 0.293487 | 0.274428 | -0.044086 | 0.022902 | 0.030287 | 0.026165 | 0.025943 | 0.028801 | 0.128378 | 0.135856 | 0.093364 | 0.139727 | 0.060453 | -0.029948 | 0.027421 | 0.130760 | 0.201507 | 0.191215 | 0.186934 | -0.156516 | -0.110503 | 0.022212 | 0.277451 | 0.088948 | -0.110486 | 0.145796 | 0.104595 | 0.141638 | -0.031092 | 0.278986 |
| new_cases_smoothed_per_million | 0.082189 | 0.120201 | 0.122074 | 0.078132 | 0.096648 | 0.097269 | 0.706893 | 0.845264 | 1.000000 | 0.573430 | 0.548876 | 0.696430 | 0.172021 | 0.192666 | 0.453811 | 0.241222 | 0.503844 | 0.100595 | 0.144452 | 0.083918 | 0.134118 | 0.160067 | 0.128573 | 0.335165 | 0.282043 | 0.174152 | 0.348233 | 0.324376 | -0.051688 | 0.028140 | 0.037136 | 0.031701 | 0.030964 | 0.036003 | 0.152240 | 0.158618 | 0.112353 | 0.166602 | 0.076521 | -0.034918 | 0.032288 | 0.152835 | 0.235662 | 0.223876 | 0.219669 | -0.183398 | -0.130390 | 0.026003 | 0.324493 | 0.104041 | -0.129319 | 0.169874 | 0.122631 | 0.166580 | -0.035884 | 0.334678 |
| total_deaths_per_million | 0.169975 | 0.146790 | 0.149869 | 0.187885 | 0.153542 | 0.160171 | 0.830203 | 0.479247 | 0.573430 | 1.000000 | 0.470466 | 0.627766 | 0.113382 | 0.209748 | 0.394161 | 0.268544 | 0.407010 | 0.073335 | 0.081277 | 0.081456 | 0.089775 | 0.206481 | 0.196909 | 0.284645 | 0.213855 | 0.227330 | 0.266208 | 0.246467 | -0.049904 | 0.109736 | 0.134283 | 0.124896 | 0.101472 | 0.115260 | 0.164580 | 0.193320 | 0.123952 | 0.155402 | 0.067364 | -0.017309 | -0.024086 | 0.137055 | 0.261580 | 0.264115 | 0.200522 | -0.190581 | -0.210793 | -0.044598 | 0.319525 | 0.030308 | -0.101904 | 0.143193 | 0.087003 | 0.131226 | 0.011712 | 0.284074 |
| new_deaths_per_million | 0.071839 | 0.095649 | 0.092143 | 0.077617 | 0.140592 | 0.115682 | 0.437590 | 0.525415 | 0.548876 | 0.470466 | 1.000000 | 0.761659 | 0.095104 | 0.147454 | 0.384704 | 0.201959 | 0.455957 | 0.042898 | 0.068872 | 0.034135 | 0.075834 | 0.113285 | 0.086355 | 0.169078 | 0.171336 | 0.116431 | 0.197733 | 0.278953 | -0.038170 | 0.029405 | 0.039161 | 0.031898 | 0.033830 | 0.036462 | 0.059193 | 0.072336 | 0.041953 | 0.070315 | 0.087141 | -0.016930 | -0.004708 | 0.125312 | 0.204168 | 0.203046 | 0.111495 | -0.125432 | -0.094610 | -0.017334 | 0.239894 | 0.051983 | -0.071446 | 0.133625 | 0.060635 | 0.107483 | 0.041898 | 0.168618 |
| new_deaths_smoothed_per_million | 0.095372 | 0.114138 | 0.118265 | 0.103195 | 0.147183 | 0.151781 | 0.580378 | 0.573209 | 0.696430 | 0.627766 | 0.761659 | 1.000000 | 0.111029 | 0.188483 | 0.493884 | 0.258051 | 0.585551 | 0.088153 | 0.116361 | 0.077859 | 0.128290 | 0.139507 | 0.116111 | 0.224733 | 0.215499 | 0.153056 | 0.257801 | 0.352966 | -0.049257 | 0.040672 | 0.053856 | 0.044109 | 0.044128 | 0.050147 | 0.080723 | 0.096940 | 0.058043 | 0.092980 | 0.116187 | -0.021857 | -0.005661 | 0.162006 | 0.264513 | 0.263256 | 0.145045 | -0.162764 | -0.124260 | -0.022725 | 0.310304 | 0.066786 | -0.092489 | 0.172452 | 0.078847 | 0.139706 | 0.050747 | 0.224186 |
| reproduction_rate | -0.053677 | -0.051484 | -0.053868 | -0.059426 | -0.058144 | -0.061976 | 0.113436 | 0.161570 | 0.172021 | 0.113382 | 0.095104 | 0.111029 | 1.000000 | 0.038855 | 0.091811 | 0.049207 | 0.098755 | 0.025354 | 0.022764 | 0.017967 | 0.026981 | 0.074027 | 0.050926 | 0.127212 | 0.113066 | 0.080627 | 0.134845 | 0.279054 | -0.001919 | -0.032688 | -0.032454 | -0.029463 | -0.032368 | -0.034495 | -0.024869 | -0.026788 | -0.025987 | -0.056890 | 0.418368 | -0.042066 | -0.022857 | 0.376137 | 0.294493 | 0.281492 | 0.180238 | -0.011607 | 0.127363 | 0.052727 | 0.237481 | 0.215312 | 0.071026 | 0.172320 | 0.333756 | 0.357612 | -0.039500 | 0.125472 |
| icu_patients | 0.167050 | 0.207915 | 0.212243 | 0.141831 | 0.153560 | 0.156295 | 0.189610 | 0.162670 | 0.192666 | 0.209748 | 0.147454 | 0.188483 | 0.038855 | 1.000000 | 0.458657 | 0.935068 | 0.274910 | 0.057584 | 0.024722 | 0.344632 | 0.083035 | 0.693271 | 0.678932 | 0.124732 | 0.082291 | 0.723640 | 0.104286 | 0.047774 | -0.010850 | 0.069731 | 0.087829 | 0.077317 | 0.074826 | 0.096462 | 0.048609 | 0.060465 | 0.033312 | 0.049908 | 0.072597 | 0.008329 | -0.012730 | 0.108314 | 0.152195 | 0.154094 | 0.146779 | -0.042952 | -0.068656 | 0.025804 | 0.143981 | 0.025479 | -0.070677 | 0.053769 | 0.062059 | 0.101677 | 0.002655 | 0.124600 |
| icu_patients_per_million | 0.044366 | 0.064766 | 0.065476 | 0.037020 | 0.054363 | 0.054274 | 0.379739 | 0.380001 | 0.453811 | 0.394161 | 0.384704 | 0.493884 | 0.091811 | 0.458657 | 1.000000 | 0.482915 | 0.707177 | 0.174015 | 0.195108 | 0.164129 | 0.207849 | 0.262502 | 0.221972 | 0.255339 | 0.227051 | 0.290548 | 0.276112 | 0.145644 | -0.022177 | 0.018971 | 0.027037 | 0.022048 | 0.020296 | 0.026192 | 0.098063 | 0.111768 | 0.088027 | 0.083225 | 0.144835 | -0.027614 | -0.020657 | 0.262480 | 0.363494 | 0.372616 | 0.263075 | -0.097072 | -0.116509 | -0.037622 | 0.346683 | 0.091937 | -0.155586 | 0.191935 | 0.143717 | 0.213604 | 0.016448 | 0.255071 |
| hosp_patients | 0.155312 | 0.198959 | 0.202595 | 0.134571 | 0.150057 | 0.152396 | 0.222291 | 0.204918 | 0.241222 | 0.268544 | 0.201959 | 0.258051 | 0.049207 | 0.935068 | 0.482915 | 1.000000 | 0.441252 | 0.079062 | 0.033106 | 0.329161 | 0.085865 | 0.672587 | 0.637165 | 0.145622 | 0.113784 | 0.691770 | 0.135677 | 0.084717 | -0.012767 | 0.063867 | 0.081964 | 0.068257 | 0.069302 | 0.090230 | 0.056429 | 0.072026 | 0.035746 | 0.059107 | 0.089009 | 0.002732 | -0.014845 | 0.137496 | 0.194530 | 0.195155 | 0.154903 | -0.053775 | -0.083234 | 0.000321 | 0.180451 | 0.036959 | -0.086607 | 0.067886 | 0.077994 | 0.120263 | 0.008314 | 0.145460 |
| hosp_patients_per_million | 0.019911 | 0.036703 | 0.036813 | 0.017193 | 0.033229 | 0.032897 | 0.379806 | 0.422492 | 0.503844 | 0.407010 | 0.455957 | 0.585551 | 0.098755 | 0.274910 | 0.707177 | 0.441252 | 1.000000 | 0.131836 | 0.166779 | 0.096579 | 0.152166 | 0.197234 | 0.141853 | 0.286767 | 0.311630 | 0.205804 | 0.358578 | 0.213709 | -0.022640 | 0.008063 | 0.013947 | 0.009229 | 0.009232 | 0.012564 | 0.084606 | 0.100773 | 0.068859 | 0.080413 | 0.130002 | -0.032953 | -0.025507 | 0.274021 | 0.376931 | 0.381943 | 0.218247 | -0.103530 | -0.067565 | -0.065286 | 0.374788 | 0.121144 | -0.161093 | 0.222383 | 0.137395 | 0.208631 | 0.014673 | 0.286501 |
| weekly_icu_admissions | 0.003233 | 0.003244 | 0.007140 | 0.002716 | 0.000592 | 0.006190 | 0.082057 | 0.050180 | 0.100595 | 0.073335 | 0.042898 | 0.088153 | 0.025354 | 0.057584 | 0.174015 | 0.079062 | 0.131836 | 1.000000 | 0.725509 | 0.220380 | 0.452138 | 0.001681 | -0.002840 | 0.008788 | 0.010118 | 0.034354 | 0.045339 | 0.024463 | -0.004593 | 0.001039 | 0.001522 | 0.002628 | -0.000898 | 0.001147 | 0.034184 | 0.036544 | 0.034657 | 0.027089 | 0.027721 | -0.005597 | -0.003803 | 0.046404 | 0.067425 | 0.067026 | 0.039510 | -0.019339 | -0.031350 | -0.012927 | 0.080540 | 0.026038 | -0.029040 | 0.039539 | 0.029031 | 0.039938 | 0.005702 | 0.008722 |
| weekly_icu_admissions_per_million | -0.001386 | -0.002144 | 0.000651 | -0.003176 | -0.003276 | -0.001283 | 0.112887 | 0.074757 | 0.144452 | 0.081277 | 0.068872 | 0.116361 | 0.022764 | 0.024722 | 0.195108 | 0.033106 | 0.166779 | 0.725509 | 1.000000 | 0.140506 | 0.585200 | 0.001016 | -0.001073 | 0.045667 | 0.031036 | 0.014018 | 0.071336 | 0.037657 | -0.004930 | 0.000457 | 0.000732 | 0.002406 | -0.000671 | -0.000036 | 0.056805 | 0.060279 | 0.058159 | 0.045318 | 0.027692 | -0.007795 | -0.003417 | 0.049389 | 0.070503 | 0.069327 | 0.039891 | -0.021763 | -0.018713 | -0.011976 | 0.078978 | 0.032478 | -0.032793 | 0.042805 | 0.029440 | 0.044893 | -0.001001 | 0.045602 |
| weekly_hosp_admissions | 0.060245 | 0.064484 | 0.078322 | 0.050593 | 0.028982 | 0.056241 | 0.074514 | 0.053348 | 0.083918 | 0.081456 | 0.034135 | 0.077859 | 0.017967 | 0.344632 | 0.164129 | 0.329161 | 0.096579 | 0.220380 | 0.140506 | 1.000000 | 0.470875 | 0.163707 | 0.249412 | 0.048597 | 0.018438 | 0.260891 | 0.041091 | 0.029051 | -0.004227 | 0.024469 | 0.030287 | 0.029554 | 0.026390 | 0.035353 | 0.018043 | 0.022374 | 0.012907 | 0.019192 | 0.026894 | 0.002160 | -0.004970 | 0.042488 | 0.058814 | 0.059004 | 0.050983 | -0.015782 | -0.020448 | 0.009414 | 0.057261 | 0.012390 | -0.027558 | 0.021923 | 0.023308 | 0.038155 | 0.001086 | 0.048545 |
| weekly_hosp_admissions_per_million | 0.007456 | 0.008696 | 0.013282 | 0.005673 | 0.002989 | 0.009894 | 0.094312 | 0.077163 | 0.134118 | 0.089775 | 0.075834 | 0.128290 | 0.026981 | 0.083035 | 0.207849 | 0.085865 | 0.152166 | 0.452138 | 0.585200 | 0.470875 | 1.000000 | 0.030735 | 0.042801 | 0.046025 | 0.027161 | 0.056558 | 0.060606 | 0.059068 | -0.005786 | 0.002859 | 0.004342 | 0.003761 | 0.002078 | 0.004336 | 0.020414 | 0.023955 | 0.017885 | 0.018123 | 0.029022 | -0.007501 | -0.005164 | 0.065574 | 0.089623 | 0.090200 | 0.043464 | -0.021773 | -0.007358 | -0.005046 | 0.093810 | 0.034457 | -0.038569 | 0.057214 | 0.031422 | 0.049089 | 0.000277 | 0.045947 |
| new_tests | 0.176094 | 0.197580 | 0.197794 | 0.145829 | 0.143001 | 0.139525 | 0.184743 | 0.137356 | 0.160067 | 0.206481 | 0.113285 | 0.139507 | 0.074027 | 0.693271 | 0.262502 | 0.672587 | 0.197234 | 0.001681 | 0.001016 | 0.163707 | 0.030735 | 1.000000 | 0.850537 | 0.231315 | 0.259000 | 0.959910 | 0.224101 | 0.051630 | -0.006248 | 0.080641 | 0.103696 | 0.083187 | 0.081093 | 0.097095 | 0.096496 | 0.103842 | 0.046838 | 0.071248 | 0.097247 | 0.055954 | -0.015365 | 0.120822 | 0.136676 | 0.131513 | 0.143164 | -0.035680 | -0.032787 | 0.058241 | 0.129709 | 0.059662 | -0.042721 | 0.046531 | 0.071982 | 0.118512 | -0.009751 | 0.231107 |
| total_tests | 0.210690 | 0.172913 | 0.178206 | 0.163375 | 0.127968 | 0.134119 | 0.195524 | 0.105801 | 0.128573 | 0.196909 | 0.086355 | 0.116111 | 0.050926 | 0.678932 | 0.221972 | 0.637165 | 0.141853 | -0.002840 | -0.001073 | 0.249412 | 0.042801 | 0.850537 | 1.000000 | 0.233888 | 0.129574 | 0.877601 | 0.149869 | 0.038318 | -0.000818 | 0.141127 | 0.173132 | 0.165012 | 0.138028 | 0.164505 | 0.118158 | 0.122362 | 0.073246 | 0.092846 | 0.072322 | 0.052154 | -0.011999 | 0.095470 | 0.101786 | 0.097218 | 0.116549 | -0.026701 | -0.016536 | 0.059215 | 0.094105 | 0.049853 | -0.031025 | 0.032103 | 0.055947 | 0.095064 | -0.008914 | 0.233722 |
| total_tests_per_thousand | 0.008818 | 0.000945 | 0.001922 | -0.002184 | -0.009556 | -0.008888 | 0.457347 | 0.279451 | 0.335165 | 0.284645 | 0.169078 | 0.224733 | 0.127212 | 0.124732 | 0.255339 | 0.145622 | 0.286767 | 0.008788 | 0.045667 | 0.048597 | 0.046025 | 0.231315 | 0.233888 | 1.000000 | 0.700227 | 0.234009 | 0.808760 | 0.045558 | 0.049268 | 0.025935 | 0.029039 | 0.026121 | 0.018542 | 0.023061 | 0.296303 | 0.226624 | 0.171590 | 0.231448 | 0.082132 | -0.044473 | 0.002842 | 0.242261 | 0.234581 | 0.226684 | 0.384560 | -0.128383 | -0.094787 | 0.077516 | 0.243980 | 0.134093 | -0.127758 | 0.121365 | 0.166853 | 0.249154 | -0.027813 | 1.000000 |
| new_tests_per_thousand | -0.002867 | -0.001406 | -0.001699 | -0.008226 | -0.008984 | -0.010318 | 0.306539 | 0.248827 | 0.282043 | 0.213855 | 0.171336 | 0.215499 | 0.113066 | 0.082291 | 0.227051 | 0.113784 | 0.311630 | 0.010118 | 0.031036 | 0.018438 | 0.027161 | 0.259000 | 0.129574 | 0.700227 | 1.000000 | 0.195554 | 0.841009 | 0.034472 | 0.022400 | 0.007601 | 0.009698 | 0.004542 | 0.004217 | 0.004911 | 0.173409 | 0.136969 | 0.095031 | 0.140798 | 0.082602 | -0.035225 | -0.009117 | 0.196737 | 0.203759 | 0.197701 | 0.272082 | -0.100472 | -0.070550 | 0.042892 | 0.215084 | 0.111272 | -0.103854 | 0.114516 | 0.128829 | 0.192909 | -0.019392 | 0.700120 |
| new_tests_smoothed | 0.184394 | 0.200244 | 0.204380 | 0.152508 | 0.140609 | 0.145490 | 0.201747 | 0.145248 | 0.174152 | 0.227330 | 0.116431 | 0.153056 | 0.080627 | 0.723640 | 0.290548 | 0.691770 | 0.205804 | 0.034354 | 0.014018 | 0.260891 | 0.056558 | 0.959910 | 0.877601 | 0.234009 | 0.195554 | 1.000000 | 0.236990 | 0.053568 | -0.000302 | 0.085584 | 0.109490 | 0.090903 | 0.085717 | 0.103846 | 0.100421 | 0.108343 | 0.050790 | 0.075685 | 0.106010 | 0.055768 | -0.010963 | 0.143650 | 0.163709 | 0.161407 | 0.163299 | -0.043521 | -0.040159 | 0.060796 | 0.154964 | 0.069635 | -0.053442 | 0.066067 | 0.083632 | 0.136084 | -0.010460 | 0.233772 |
| new_tests_smoothed_per_thousand | -0.003935 | -0.003066 | -0.002600 | -0.010534 | -0.013168 | -0.013247 | 0.387508 | 0.293487 | 0.348233 | 0.266208 | 0.197733 | 0.257801 | 0.134845 | 0.104286 | 0.276112 | 0.135677 | 0.358578 | 0.045339 | 0.071336 | 0.041091 | 0.060606 | 0.224101 | 0.149869 | 0.808760 | 0.841009 | 0.236990 | 1.000000 | 0.044084 | 0.054507 | 0.008228 | 0.010767 | 0.005416 | 0.004095 | 0.005433 | 0.203618 | 0.163401 | 0.114343 | 0.167616 | 0.095737 | -0.044598 | 0.027929 | 0.248177 | 0.257854 | 0.250638 | 0.349434 | -0.127458 | -0.096131 | 0.058108 | 0.267484 | 0.140483 | -0.132594 | 0.144199 | 0.166006 | 0.246570 | -0.025004 | 0.808666 |
| positive_rate | -0.031146 | -0.021606 | -0.021468 | -0.026620 | -0.011568 | -0.012398 | 0.200742 | 0.274428 | 0.324376 | 0.246467 | 0.278953 | 0.352966 | 0.279054 | 0.047774 | 0.145644 | 0.084717 | 0.213709 | 0.024463 | 0.037657 | 0.029051 | 0.059068 | 0.051630 | 0.038318 | 0.045558 | 0.034472 | 0.053568 | 0.044084 | 1.000000 | -0.059293 | -0.023491 | -0.022768 | -0.019281 | -0.022766 | -0.025715 | -0.008491 | -0.005893 | -0.002592 | -0.023333 | 0.238004 | -0.062091 | -0.049598 | 0.172592 | 0.142323 | 0.121320 | 0.038846 | -0.010356 | -0.008876 | 0.049870 | 0.107077 | 0.075188 | 0.127191 | 0.036745 | 0.147926 | 0.187903 | -0.004562 | 0.044721 |
| tests_per_case | -0.016797 | -0.019654 | -0.019736 | -0.018823 | -0.021013 | -0.021468 | -0.042170 | -0.044086 | -0.051688 | -0.049904 | -0.038170 | -0.049257 | -0.001919 | -0.010850 | -0.022177 | -0.012767 | -0.022640 | -0.004593 | -0.004930 | -0.004227 | -0.005786 | -0.006248 | -0.000818 | 0.049268 | 0.022400 | -0.000302 | 0.054507 | -0.059293 | 1.000000 | -0.007437 | -0.007608 | -0.006871 | -0.007525 | -0.008812 | -0.009393 | -0.010211 | -0.007267 | -0.011795 | 0.016976 | -0.019104 | 0.038058 | 0.086532 | 0.071289 | 0.071439 | 0.093266 | -0.045423 | -0.053766 | 0.006233 | 0.025387 | 0.016185 | 0.005201 | 0.024104 | 0.065463 | 0.074840 | 0.004192 | 0.049068 |
| total_vaccinations | 0.709946 | 0.388810 | 0.392009 | 0.665703 | 0.414428 | 0.434847 | 0.103211 | 0.022902 | 0.028140 | 0.109736 | 0.029405 | 0.040672 | -0.032688 | 0.069731 | 0.018971 | 0.063867 | 0.008063 | 0.001039 | 0.000457 | 0.024469 | 0.002859 | 0.080641 | 0.141127 | 0.025935 | 0.007601 | 0.085584 | 0.008228 | -0.023491 | -0.007437 | 1.000000 | 0.985054 | 0.957749 | 0.875001 | 0.960024 | 0.118665 | 0.122346 | 0.090418 | 0.082558 | -0.073245 | 0.311227 | -0.010071 | -0.036586 | -0.012049 | -0.012172 | -0.006237 | -0.015202 | -0.051262 | -0.028161 | -0.006693 | -0.011914 | -0.001401 | -0.020649 | -0.085711 | -0.047074 | -0.002702 | 0.025837 |
| people_vaccinated | 0.716938 | 0.397605 | 0.401355 | 0.677479 | 0.429076 | 0.450076 | 0.123252 | 0.030287 | 0.037136 | 0.134283 | 0.039161 | 0.053856 | -0.032454 | 0.087829 | 0.027037 | 0.081964 | 0.013947 | 0.001522 | 0.000732 | 0.030287 | 0.004342 | 0.103696 | 0.173132 | 0.029039 | 0.009698 | 0.109490 | 0.010767 | -0.022768 | -0.007608 | 0.985054 | 1.000000 | 0.972234 | 0.858912 | 0.943917 | 0.128722 | 0.145309 | 0.099634 | 0.092726 | -0.071000 | 0.287409 | -0.010151 | -0.035110 | -0.008371 | -0.008399 | -0.004519 | -0.015686 | -0.053841 | -0.029690 | -0.002798 | -0.012807 | -0.002428 | -0.020879 | -0.085441 | -0.045670 | -0.002199 | 0.028937 |
| people_fully_vaccinated | 0.637546 | 0.328983 | 0.328814 | 0.602474 | 0.352333 | 0.370088 | 0.120889 | 0.026165 | 0.031701 | 0.124896 | 0.031898 | 0.044109 | -0.029463 | 0.077317 | 0.022048 | 0.068257 | 0.009229 | 0.002628 | 0.002406 | 0.029554 | 0.003761 | 0.083187 | 0.165012 | 0.026121 | 0.004542 | 0.090903 | 0.005416 | -0.019281 | -0.006871 | 0.957749 | 0.972234 | 1.000000 | 0.807367 | 0.887331 | 0.139038 | 0.145626 | 0.130317 | 0.093888 | -0.064170 | 0.242055 | -0.009298 | -0.031113 | -0.006861 | -0.007080 | -0.001097 | -0.014860 | -0.048160 | -0.024715 | -0.000799 | -0.011158 | -0.004413 | -0.016593 | -0.075702 | -0.039920 | -0.001935 | 0.026033 |
| new_vaccinations | 0.699287 | 0.422973 | 0.425698 | 0.653199 | 0.456267 | 0.467670 | 0.094056 | 0.025943 | 0.030964 | 0.101472 | 0.033830 | 0.044128 | -0.032368 | 0.074826 | 0.020296 | 0.069302 | 0.009232 | -0.000898 | -0.000671 | 0.026390 | 0.002078 | 0.081093 | 0.138028 | 0.018542 | 0.004217 | 0.085717 | 0.004095 | -0.022766 | -0.007525 | 0.875001 | 0.858912 | 0.807367 | 1.000000 | 0.899155 | 0.089690 | 0.095552 | 0.065368 | 0.072363 | -0.074309 | 0.318310 | -0.010010 | -0.038497 | -0.014155 | -0.014160 | -0.009345 | -0.013891 | -0.051031 | -0.028608 | -0.008240 | -0.012932 | 0.001317 | -0.022176 | -0.087618 | -0.049239 | -0.002707 | 0.018442 |
| new_vaccinations_smoothed | 0.766562 | 0.454275 | 0.463595 | 0.716121 | 0.487689 | 0.511886 | 0.107851 | 0.028801 | 0.036003 | 0.115260 | 0.036462 | 0.050147 | -0.034495 | 0.096462 | 0.026192 | 0.090230 | 0.012564 | 0.001147 | -0.000036 | 0.035353 | 0.004336 | 0.097095 | 0.164505 | 0.023061 | 0.004911 | 0.103846 | 0.005433 | -0.025715 | -0.008812 | 0.960024 | 0.943917 | 0.887331 | 0.899155 | 1.000000 | 0.100958 | 0.107417 | 0.073796 | 0.084578 | -0.077143 | 0.352143 | -0.011321 | -0.036704 | -0.011687 | -0.012751 | -0.008188 | -0.018000 | -0.055207 | -0.028498 | -0.008931 | -0.007200 | -0.002696 | -0.020466 | -0.092898 | -0.050473 | -0.003393 | 0.022941 |
| total_vaccinations_per_hundred | 0.061655 | 0.025148 | 0.026080 | 0.053589 | 0.027582 | 0.029955 | 0.250308 | 0.128378 | 0.152240 | 0.164580 | 0.059193 | 0.080723 | -0.024869 | 0.048609 | 0.098063 | 0.056429 | 0.084606 | 0.034184 | 0.056805 | 0.018043 | 0.020414 | 0.096496 | 0.118158 | 0.296303 | 0.173409 | 0.100421 | 0.203618 | -0.008491 | -0.009393 | 0.118665 | 0.128722 | 0.139038 | 0.089690 | 0.100958 | 1.000000 | 0.918229 | 0.892749 | 0.633313 | -0.027286 | -0.005554 | 0.033330 | -0.000745 | 0.028840 | 0.024467 | 0.068694 | -0.051714 | -0.080507 | -0.010601 | 0.049559 | 0.009010 | -0.061014 | 0.002118 | 0.038295 | -0.006267 | -0.011414 | 0.296202 |
| people_vaccinated_per_hundred | 0.069220 | 0.029821 | 0.030955 | 0.062107 | 0.034202 | 0.036991 | 0.261211 | 0.135856 | 0.158618 | 0.193320 | 0.072336 | 0.096940 | -0.026788 | 0.060465 | 0.111768 | 0.072026 | 0.100773 | 0.036544 | 0.060279 | 0.022374 | 0.023955 | 0.103842 | 0.122362 | 0.226624 | 0.136969 | 0.108343 | 0.163401 | -0.005893 | -0.010211 | 0.122346 | 0.145309 | 0.145626 | 0.095552 | 0.107417 | 0.918229 | 1.000000 | 0.882632 | 0.634031 | -0.023058 | -0.006301 | 0.037979 | -0.001204 | 0.046130 | 0.041364 | 0.054583 | -0.052676 | -0.095189 | -0.032465 | 0.065581 | 0.000998 | -0.066005 | 0.010227 | 0.036722 | -0.013074 | -0.010258 | 0.226494 |
| people_fully_vaccinated_per_hundred | 0.045063 | 0.015983 | 0.016457 | 0.038132 | 0.017284 | 0.018918 | 0.199877 | 0.093364 | 0.112353 | 0.123952 | 0.041953 | 0.058043 | -0.025987 | 0.033312 | 0.088027 | 0.035746 | 0.068859 | 0.034657 | 0.058159 | 0.012907 | 0.017885 | 0.046838 | 0.073246 | 0.171590 | 0.095031 | 0.050790 | 0.114343 | -0.002592 | -0.007267 | 0.090418 | 0.099634 | 0.130317 | 0.065368 | 0.073796 | 0.892749 | 0.882632 | 1.000000 | 0.508073 | -0.026437 | -0.005842 | 0.032462 | -0.009867 | 0.028714 | 0.023995 | 0.035012 | -0.037047 | -0.071317 | -0.029801 | 0.046449 | 0.000693 | -0.049580 | 0.006791 | 0.031069 | -0.014912 | -0.008141 | 0.171501 |
| new_vaccinations_smoothed_per_million | 0.048157 | 0.018978 | 0.020343 | 0.039971 | 0.020900 | 0.023029 | 0.238136 | 0.139727 | 0.166602 | 0.155402 | 0.070315 | 0.092980 | -0.056890 | 0.049908 | 0.083225 | 0.059107 | 0.080413 | 0.027089 | 0.045318 | 0.019192 | 0.018123 | 0.071248 | 0.092846 | 0.231448 | 0.140798 | 0.075685 | 0.167616 | -0.023333 | -0.011795 | 0.082558 | 0.092726 | 0.093888 | 0.072363 | 0.084578 | 0.633313 | 0.634031 | 0.508073 | 1.000000 | -0.067837 | -0.014540 | 0.082324 | -0.068941 | -0.009645 | -0.013092 | 0.064651 | -0.073926 | -0.136471 | -0.024017 | 0.024590 | -0.033433 | -0.080367 | -0.003140 | 0.045405 | -0.085219 | -0.014338 | 0.231352 |
| stringency_index | -0.143389 | -0.158905 | -0.159214 | -0.160678 | -0.166023 | -0.169794 | 0.047408 | 0.060453 | 0.076521 | 0.067364 | 0.087141 | 0.116187 | 0.418368 | 0.072597 | 0.144835 | 0.089009 | 0.130002 | 0.027721 | 0.027692 | 0.026894 | 0.029022 | 0.097247 | 0.072322 | 0.082132 | 0.082602 | 0.106010 | 0.095737 | 0.238004 | 0.016976 | -0.073245 | -0.071000 | -0.064170 | -0.074309 | -0.077143 | -0.027286 | -0.023058 | -0.026437 | -0.067837 | 1.000000 | -0.194006 | 0.017389 | 0.289722 | 0.161853 | 0.140006 | 0.137627 | -0.042154 | 0.159565 | 0.136867 | 0.124935 | 0.208290 | 0.138666 | 0.088965 | 0.329506 | 0.316832 | 0.026320 | 0.079944 |
| population | 0.603536 | 0.661523 | 0.664703 | 0.640373 | 0.679298 | 0.695507 | -0.032864 | -0.029948 | -0.034918 | -0.017309 | -0.016930 | -0.021857 | -0.042066 | 0.008329 | -0.027614 | 0.002732 | -0.032953 | -0.005597 | -0.007795 | 0.002160 | -0.007501 | 0.055954 | 0.052154 | -0.044473 | -0.035225 | 0.055768 | -0.044598 | -0.062091 | -0.019104 | 0.311227 | 0.287409 | 0.242055 | 0.318310 | 0.352143 | -0.005554 | -0.006301 | -0.005842 | -0.014540 | -0.194006 | 1.000000 | -0.026434 | -0.110267 | -0.063992 | -0.066128 | -0.070108 | -0.021541 | -0.106414 | -0.075626 | -0.067073 | -0.016264 | 0.032874 | -0.061791 | -0.235857 | -0.147653 | 0.004737 | -0.044799 |
| population_density | -0.022085 | -0.025341 | -0.025452 | -0.025668 | -0.027743 | -0.028383 | 0.027198 | 0.027421 | 0.032288 | -0.024086 | -0.004708 | -0.005661 | -0.022857 | -0.012730 | -0.020657 | -0.014845 | -0.025507 | -0.003803 | -0.003417 | -0.004970 | -0.005164 | -0.015365 | -0.011999 | 0.002842 | -0.009117 | -0.010963 | 0.027929 | -0.049598 | 0.038058 | -0.010071 | -0.010151 | -0.009298 | -0.010010 | -0.011321 | 0.033330 | 0.037979 | 0.032462 | 0.082324 | 0.017389 | -0.026434 | 1.000000 | -0.078929 | -0.038343 | -0.047520 | 0.098251 | -0.055936 | -0.179121 | 0.026118 | -0.062127 | -0.063928 | -0.070068 | 0.268463 | 0.123856 | -0.128936 | 0.000378 | 0.002514 |
| median_age | -0.076957 | -0.086924 | -0.087217 | -0.087175 | -0.093805 | -0.095988 | 0.128905 | 0.130760 | 0.152835 | 0.137055 | 0.125312 | 0.162006 | 0.376137 | 0.108314 | 0.262480 | 0.137496 | 0.274021 | 0.046404 | 0.049389 | 0.042488 | 0.065574 | 0.120822 | 0.095470 | 0.242261 | 0.196737 | 0.143650 | 0.248177 | 0.172592 | 0.086532 | -0.036586 | -0.035110 | -0.031113 | -0.038497 | -0.036704 | -0.000745 | -0.001204 | -0.009867 | -0.068941 | 0.289722 | -0.110267 | -0.078929 | 1.000000 | 0.853135 | 0.837727 | 0.541387 | -0.284861 | 0.151234 | 0.286323 | 0.596544 | 0.509389 | 0.010280 | 0.533658 | 0.660811 | 0.804194 | 0.002433 | 0.241955 |
| aged_65_older | -0.028515 | -0.028955 | -0.029013 | -0.030545 | -0.032032 | -0.032732 | 0.191239 | 0.201507 | 0.235662 | 0.261580 | 0.204168 | 0.264513 | 0.294493 | 0.152195 | 0.363494 | 0.194530 | 0.376931 | 0.067425 | 0.070503 | 0.058814 | 0.089623 | 0.136676 | 0.101786 | 0.234581 | 0.203759 | 0.163709 | 0.257854 | 0.142323 | 0.071289 | -0.012049 | -0.008371 | -0.006861 | -0.014155 | -0.011687 | 0.028840 | 0.046130 | 0.028714 | -0.009645 | 0.161853 | -0.063992 | -0.038343 | 0.853135 | 1.000000 | 0.965808 | 0.510197 | -0.314490 | -0.061668 | 0.077630 | 0.770230 | 0.417560 | -0.163717 | 0.627768 | 0.495638 | 0.690167 | 0.019838 | 0.233574 |
| aged_70_older | -0.028776 | -0.029078 | -0.029132 | -0.029817 | -0.031190 | -0.031858 | 0.182706 | 0.191215 | 0.223876 | 0.264115 | 0.203046 | 0.263256 | 0.281492 | 0.154094 | 0.372616 | 0.195155 | 0.381943 | 0.067026 | 0.069327 | 0.059004 | 0.090200 | 0.131513 | 0.097218 | 0.226684 | 0.197701 | 0.161407 | 0.250638 | 0.121320 | 0.071439 | -0.012172 | -0.008399 | -0.007080 | -0.014160 | -0.012751 | 0.024467 | 0.041364 | 0.023995 | -0.013092 | 0.140006 | -0.066128 | -0.047520 | 0.837727 | 0.965808 | 1.000000 | 0.492902 | -0.315749 | -0.098685 | 0.018650 | 0.729027 | 0.380441 | -0.204727 | 0.608093 | 0.485926 | 0.638597 | 0.022357 | 0.225655 |
| gdp_per_capita | -0.023861 | -0.025403 | -0.025329 | -0.028874 | -0.031716 | -0.032321 | 0.263809 | 0.186934 | 0.219669 | 0.200522 | 0.111495 | 0.145045 | 0.180238 | 0.146779 | 0.263075 | 0.154903 | 0.218247 | 0.039510 | 0.039891 | 0.050983 | 0.043464 | 0.143164 | 0.116549 | 0.384560 | 0.272082 | 0.163299 | 0.349434 | 0.038846 | 0.093266 | -0.006237 | -0.004519 | -0.001097 | -0.009345 | -0.008188 | 0.068694 | 0.054583 | 0.035012 | 0.064651 | 0.137627 | -0.070108 | 0.098251 | 0.541387 | 0.510197 | 0.492902 | 1.000000 | -0.346698 | -0.284708 | 0.282379 | 0.405520 | 0.192875 | -0.273995 | 0.342248 | 0.425693 | 0.548965 | -0.018833 | 0.384017 |
| extreme_poverty | -0.030544 | -0.036756 | -0.036887 | -0.037296 | -0.040879 | -0.041816 | -0.194549 | -0.156516 | -0.183398 | -0.190581 | -0.125432 | -0.162764 | -0.011607 | -0.042952 | -0.097072 | -0.053775 | -0.103530 | -0.019339 | -0.021763 | -0.015782 | -0.021773 | -0.035680 | -0.026701 | -0.128383 | -0.100472 | -0.043521 | -0.127458 | -0.010356 | -0.045423 | -0.015202 | -0.015686 | -0.014860 | -0.013891 | -0.018000 | -0.051714 | -0.052676 | -0.037047 | -0.073926 | -0.042154 | -0.021541 | -0.055936 | -0.284861 | -0.314490 | -0.315749 | -0.346698 | 1.000000 | 0.191294 | -0.273647 | -0.241286 | -0.090650 | -0.023015 | -0.317773 | -0.145519 | -0.253455 | -0.004740 | -0.129305 |
| cardiovasc_death_rate | -0.103823 | -0.119975 | -0.120503 | -0.123561 | -0.131013 | -0.134150 | -0.162586 | -0.110503 | -0.130390 | -0.210793 | -0.094610 | -0.124260 | 0.127363 | -0.068656 | -0.116509 | -0.083234 | -0.067565 | -0.031350 | -0.018713 | -0.020448 | -0.007358 | -0.032787 | -0.016536 | -0.094787 | -0.070550 | -0.040159 | -0.096131 | -0.008876 | -0.053766 | -0.051262 | -0.053841 | -0.048160 | -0.051031 | -0.055207 | -0.080507 | -0.095189 | -0.071317 | -0.136471 | 0.159565 | -0.106414 | -0.179121 | 0.151234 | -0.061668 | -0.098685 | -0.284708 | 0.191294 | 1.000000 | 0.244890 | -0.041821 | 0.334512 | 0.279880 | 0.063484 | 0.170786 | 0.152652 | 0.007339 | -0.098590 |
| diabetes_prevalence | -0.060493 | -0.072198 | -0.072394 | -0.072404 | -0.078984 | -0.080795 | 0.042081 | 0.022212 | 0.026003 | -0.044598 | -0.017334 | -0.022725 | 0.052727 | 0.025804 | -0.037622 | 0.000321 | -0.065286 | -0.012927 | -0.011976 | 0.009414 | -0.005046 | 0.058241 | 0.059215 | 0.077516 | 0.042892 | 0.060796 | 0.058108 | 0.049870 | 0.006233 | -0.028161 | -0.029690 | -0.024715 | -0.028608 | -0.028498 | -0.010601 | -0.032465 | -0.029801 | -0.024017 | 0.136867 | -0.075626 | 0.026118 | 0.286323 | 0.077630 | 0.018650 | 0.282379 | -0.273647 | 0.244890 | 1.000000 | 0.005562 | 0.164340 | 0.144972 | 0.103632 | 0.436338 | 0.461805 | -0.016454 | 0.075327 |
| female_smokers | -0.016033 | -0.014383 | -0.014427 | -0.017543 | -0.016850 | -0.017246 | 0.290454 | 0.277451 | 0.324493 | 0.319525 | 0.239894 | 0.310304 | 0.237481 | 0.143981 | 0.346683 | 0.180451 | 0.374788 | 0.080540 | 0.078978 | 0.057261 | 0.093810 | 0.129709 | 0.094105 | 0.243980 | 0.215084 | 0.154964 | 0.267484 | 0.107077 | 0.025387 | -0.006693 | -0.002798 | -0.000799 | -0.008240 | -0.008931 | 0.049559 | 0.065581 | 0.046449 | 0.024590 | 0.124935 | -0.067073 | -0.062127 | 0.596544 | 0.770230 | 0.729027 | 0.405520 | -0.241286 | -0.041821 | 0.005562 | 1.000000 | 0.481034 | -0.230735 | 0.472167 | 0.357628 | 0.519716 | 0.029037 | 0.243168 |
| male_smokers | -0.028103 | -0.031940 | -0.032033 | -0.035624 | -0.035987 | -0.036835 | 0.090835 | 0.088948 | 0.104041 | 0.030308 | 0.051983 | 0.066786 | 0.215312 | 0.025479 | 0.091937 | 0.036959 | 0.121144 | 0.026038 | 0.032478 | 0.012390 | 0.034457 | 0.059662 | 0.049853 | 0.134093 | 0.111272 | 0.069635 | 0.140483 | 0.075188 | 0.016185 | -0.011914 | -0.012807 | -0.011158 | -0.012932 | -0.007200 | 0.009010 | 0.000998 | 0.000693 | -0.033433 | 0.208290 | -0.016264 | -0.063928 | 0.509389 | 0.417560 | 0.380441 | 0.192875 | -0.090650 | 0.334512 | 0.164340 | 0.481034 | 1.000000 | 0.128322 | 0.366958 | 0.340875 | 0.461399 | -0.000105 | 0.132629 |
| handwashing_facilities | 0.003269 | -0.000706 | -0.000643 | 0.004469 | 0.003875 | 0.004048 | -0.119412 | -0.110486 | -0.129319 | -0.101904 | -0.071446 | -0.092489 | 0.071026 | -0.070677 | -0.155586 | -0.086607 | -0.161093 | -0.029040 | -0.032793 | -0.027558 | -0.038569 | -0.042721 | -0.031025 | -0.127758 | -0.103854 | -0.053442 | -0.132594 | 0.127191 | 0.005201 | -0.001401 | -0.002428 | -0.004413 | 0.001317 | -0.002696 | -0.061014 | -0.066005 | -0.049580 | -0.080367 | 0.138666 | 0.032874 | -0.070068 | 0.010280 | -0.163717 | -0.204727 | -0.273995 | -0.023015 | 0.279880 | 0.144972 | -0.230735 | 0.128322 | 1.000000 | -0.121914 | 0.089155 | 0.048794 | 0.009637 | -0.129136 |
| hospital_beds_per_thousand | -0.042292 | -0.045197 | -0.045448 | -0.049499 | -0.049041 | -0.050265 | 0.137358 | 0.145796 | 0.169874 | 0.143193 | 0.133625 | 0.172452 | 0.172320 | 0.053769 | 0.191935 | 0.067886 | 0.222383 | 0.039539 | 0.042805 | 0.021923 | 0.057214 | 0.046531 | 0.032103 | 0.121365 | 0.114516 | 0.066067 | 0.144199 | 0.036745 | 0.024104 | -0.020649 | -0.020879 | -0.016593 | -0.022176 | -0.020466 | 0.002118 | 0.010227 | 0.006791 | -0.003140 | 0.088965 | -0.061791 | 0.268463 | 0.533658 | 0.627768 | 0.608093 | 0.342248 | -0.317773 | 0.063484 | 0.103632 | 0.472167 | 0.366958 | -0.121914 | 1.000000 | 0.408082 | 0.452838 | 0.003035 | 0.120056 |
| life_expectancy | -0.181257 | -0.207708 | -0.208477 | -0.205350 | -0.222544 | -0.227720 | 0.125502 | 0.104595 | 0.122631 | 0.087003 | 0.060635 | 0.078847 | 0.333756 | 0.062059 | 0.143717 | 0.077994 | 0.137395 | 0.029031 | 0.029440 | 0.023308 | 0.031422 | 0.071982 | 0.055947 | 0.166853 | 0.128829 | 0.083632 | 0.166006 | 0.147926 | 0.065463 | -0.085711 | -0.085441 | -0.075702 | -0.087618 | -0.092898 | 0.038295 | 0.036722 | 0.031069 | 0.045405 | 0.329506 | -0.235857 | 0.123856 | 0.660811 | 0.495638 | 0.485926 | 0.425693 | -0.145519 | 0.170786 | 0.436338 | 0.357628 | 0.340875 | 0.089155 | 0.408082 | 1.000000 | 0.751472 | 0.001072 | 0.167732 |
| human_development_index | -0.101104 | -0.115344 | -0.115709 | -0.114825 | -0.124281 | -0.127139 | 0.163102 | 0.141638 | 0.166580 | 0.131226 | 0.107483 | 0.139706 | 0.357612 | 0.101677 | 0.213604 | 0.120263 | 0.208631 | 0.039938 | 0.044893 | 0.038155 | 0.049089 | 0.118512 | 0.095064 | 0.249154 | 0.192909 | 0.136084 | 0.246570 | 0.187903 | 0.074840 | -0.047074 | -0.045670 | -0.039920 | -0.049239 | -0.050473 | -0.006267 | -0.013074 | -0.014912 | -0.085219 | 0.316832 | -0.147653 | -0.128936 | 0.804194 | 0.690167 | 0.638597 | 0.548965 | -0.253455 | 0.152652 | 0.461805 | 0.519716 | 0.461399 | 0.048794 | 0.452838 | 0.751472 | 1.000000 | -0.005875 | 0.249810 |
| death_rate | -0.006046 | -0.007221 | -0.007022 | -0.001535 | 0.008762 | 0.009072 | -0.026569 | -0.031092 | -0.035884 | 0.011712 | 0.041898 | 0.050747 | -0.039500 | 0.002655 | 0.016448 | 0.008314 | 0.014673 | 0.005702 | -0.001001 | 0.001086 | 0.000277 | -0.009751 | -0.008914 | -0.027813 | -0.019392 | -0.010460 | -0.025004 | -0.004562 | 0.004192 | -0.002702 | -0.002199 | -0.001935 | -0.002707 | -0.003393 | -0.011414 | -0.010258 | -0.008141 | -0.014338 | 0.026320 | 0.004737 | 0.000378 | 0.002433 | 0.019838 | 0.022357 | -0.018833 | -0.004740 | 0.007339 | -0.016454 | 0.029037 | -0.000105 | 0.009637 | 0.003035 | 0.001072 | -0.005875 | 1.000000 | -0.027813 |
| population_coverage | 0.008587 | 0.000681 | 0.001657 | -0.002447 | -0.009840 | -0.009179 | 0.456945 | 0.278986 | 0.334678 | 0.284074 | 0.168618 | 0.224186 | 0.125472 | 0.124600 | 0.255071 | 0.145460 | 0.286501 | 0.008722 | 0.045602 | 0.048545 | 0.045947 | 0.231107 | 0.233722 | 1.000000 | 0.700120 | 0.233772 | 0.808666 | 0.044721 | 0.049068 | 0.025837 | 0.028937 | 0.026033 | 0.018442 | 0.022941 | 0.296202 | 0.226494 | 0.171501 | 0.231352 | 0.079944 | -0.044799 | 0.002514 | 0.241955 | 0.233574 | 0.225655 | 0.384017 | -0.129305 | -0.098590 | 0.075327 | 0.243168 | 0.132629 | -0.129136 | 0.120056 | 0.167732 | 0.249810 | -0.027813 | 1.000000 |
#owid_covid_data = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv')
#owid_covid_data.head()
#Select Coloumn to clean
ColumnToClean = ['total_cases', 'new_cases', 'total_deaths', 'new_deaths','aged_65_older','aged_70_older','gdp_per_capita','diabetes_prevalence','female_smokers','male_smokers','hospital_beds_per_thousand']
#Replace the nan with emty string
covid_df_copy[['location']] = covid_df_copy[['location']].fillna('')
#Replace the Nan with 0
covid_df_copy[ColumnToClean] = covid_df_copy[ColumnToClean].fillna(0)
#Filter the data so we will get only overall world data
covid_df_copy = covid_df_copy.query('location=="World"' )
Data_For_Regression = pd.DataFrame(columns=['date','total_cases', 'new_cases', 'total_deaths', 'new_deaths','aged_65_older','aged_70_older','gdp_per_capita','diabetes_prevalence','female_smokers','male_smokers','hospital_beds_per_thousand'], data=covid_df_copy[['date','total_cases', 'new_cases', 'total_deaths', 'new_deaths','aged_65_older','aged_70_older','gdp_per_capita','diabetes_prevalence','female_smokers','male_smokers','hospital_beds_per_thousand']].values)
Data_For_Regression.head()
| date | total_cases | new_cases | total_deaths | new_deaths | aged_65_older | aged_70_older | gdp_per_capita | diabetes_prevalence | female_smokers | male_smokers | hospital_beds_per_thousand | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020-01-22 | 557 | 0 | 17 | 0 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 1 | 2020-01-23 | 655 | 98 | 18 | 1 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 2 | 2020-01-24 | 941 | 286 | 26 | 8 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 3 | 2020-01-25 | 1433 | 492 | 42 | 16 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 4 | 2020-01-26 | 2118 | 685 | 56 | 14 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
#set the index as date
Data_For_Regression['date'] = pd.to_datetime(Data_For_Regression['date'])
Data_For_Regression = Data_For_Regression.set_index('date')
Data_For_Regression.head()
| total_cases | new_cases | total_deaths | new_deaths | aged_65_older | aged_70_older | gdp_per_capita | diabetes_prevalence | female_smokers | male_smokers | hospital_beds_per_thousand | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||
| 2020-01-22 | 557 | 0 | 17 | 0 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 2020-01-23 | 655 | 98 | 18 | 1 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 2020-01-24 | 941 | 286 | 26 | 8 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 2020-01-25 | 1433 | 492 | 42 | 16 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
| 2020-01-26 | 2118 | 685 | 56 | 14 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 |
#Plot the graph
Data_For_Regression['total_cases'].plot(figsize=(15,6), color="green")
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Cases')
plt.show()
Data_For_Regression['total_deaths'].plot(figsize=(15,6), color="red")
plt.xlabel('Date')
plt.ylabel('Death')
plt.show()
Data_For_Regression['new_cases'].plot(figsize=(15,6), color="blue")
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.show()
# pick total death as forecast column
forecast_col = 'total_deaths'
# Chosing 30 days as number of forecast days
forecast_out = int(30)
print('length =',len(Data_For_Regression), "and forecast_out =", forecast_out)
length = 424 and forecast_out = 30
# Creating label by shifting 'total_deaths' according to 'forecast_out'
Data_For_Regression['temp'] = Data_For_Regression[forecast_col].shift(-forecast_out)
print(Data_For_Regression.head(2))
print('\n')
# verify rows with NAN in Label column
print(Data_For_Regression.tail(2))
total_cases new_cases total_deaths new_deaths aged_65_older \
date
2020-01-22 557 0 17 0 8.696
2020-01-23 655 98 18 1 8.696
aged_70_older gdp_per_capita diabetes_prevalence female_smokers \
date
2020-01-22 5.355 15469.2 8.51 6.434
2020-01-23 5.355 15469.2 8.51 6.434
male_smokers hospital_beds_per_thousand temp
date
2020-01-22 34.635 2.705 2252
2020-01-23 34.635 2.705 2459
total_cases new_cases total_deaths new_deaths aged_65_older \
date
2021-03-19 1.22316e+08 526273 2.70144e+06 10410 8.696
2021-03-20 1.22814e+08 498140 2.70964e+06 8194 8.696
aged_70_older gdp_per_capita diabetes_prevalence female_smokers \
date
2021-03-19 5.355 15469.2 8.51 6.434
2021-03-20 5.355 15469.2 8.51 6.434
male_smokers hospital_beds_per_thousand temp
date
2021-03-19 34.635 2.705 NaN
2021-03-20 34.635 2.705 NaN
# Define features Matrix X by excluding the label column which we just created
X = np.array(Data_For_Regression.drop(['temp'], 1))
# Using a feature in sklearn, preposessing to scale features
X = preprocessing.scale(X)
print(X[1,:])
[-0.96277535 -1.30745477 -1.17772698 -1.57016705 0. 0. 0. 0. 0. 0. 0. ]
# X contains last 'n= forecast_out' rows for which we don't have label data
# Put those rows in different Matrix X_forecast_out by X_forecast_out = X[end-forecast_out:end]
X_forecast_out = X[-forecast_out:]
X = X[:-forecast_out]
print ("Length of X_forecast_out:", len(X_forecast_out), "& Length of X :", len(X))
Length of X_forecast_out: 30 & Length of X : 394
# Define vector y for the data we have prediction for
# make sure length of X and y are identical
y = np.array(Data_For_Regression['temp'])
y = y[:-forecast_out]
print('Length of y: ',len(y))
Length of y: 394
# (split into test and train data)
# test_size = 0.2 ==> 20% data is test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
print('length of X_train and x_test: ', len(X_train), len(X_test))
length of X_train and x_test: 315 79
# Create linear regression object
lr = LinearRegression()
# Train the model using the training sets
lr.fit(X_train, y_train)
# Test
accuracy = lr.score(X_test, y_test)
print("Accuracy of Linear Regression: ", accuracy)
Accuracy of Linear Regression: 0.9977341359065376
# Predict using our Model
forecast_prediction = lr.predict(X_forecast_out)
print(forecast_prediction)
[2761369.60822026 2742437.89032144 2716729.72549334 2723886.09683447 2788747.50374794 2821086.80735987 2819915.21601815 2830601.30857385 2810759.52283328 2776173.46813663 2795955.78519842 2827930.95955923 2878011.83052154 2883045.16751441 2893069.35882272 2877604.12019003 2855573.53005105 2854086.52866266 2915178.08026103 2937093.14712452 2950239.62234392 2962518.40862667 2956578.76082901 2916684.33256088 2929072.89997022 2992631.12646166 3021011.48961908 3038624.40664073 3041431.8177346 3028397.38128222]
Data_For_Regression.tail()
| total_cases | new_cases | total_deaths | new_deaths | aged_65_older | aged_70_older | gdp_per_capita | diabetes_prevalence | female_smokers | male_smokers | hospital_beds_per_thousand | temp | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||
| 2021-03-16 | 1.20697e+08 | 472966 | 2.67045e+06 | 9997 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 | NaN |
| 2021-03-17 | 1.21236e+08 | 538804 | 2.68052e+06 | 10063 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 | NaN |
| 2021-03-18 | 1.21789e+08 | 553312 | 2.69104e+06 | 10519 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 | NaN |
| 2021-03-19 | 1.22316e+08 | 526273 | 2.70144e+06 | 10410 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 | NaN |
| 2021-03-20 | 1.22814e+08 | 498140 | 2.70964e+06 | 8194 | 8.696 | 5.355 | 15469.2 | 8.51 | 6.434 | 34.635 | 2.705 | NaN |
last_date = Data_For_Regression.iloc[-1].name
last_date
Timestamp('2021-03-20 00:00:00')
todays_date = datetime.strptime(last_date.strftime("%Y-%m-%d"), "%Y-%m-%d")
todays_date = todays_date + timedelta(days=1)
todays_date = datetime.strptime(todays_date.strftime("%Y-%m-%d"), "%Y-%m-%d")
index = pd.date_range(todays_date, periods=30, freq='D')
columns = ['total_cases', 'new_cases', 'total_deaths', 'new_deaths','aged_65_older','aged_70_older','gdp_per_capita','diabetes_prevalence','female_smokers','male_smokers','hospital_beds_per_thousand','temp','forecast']
temp_df = pd.DataFrame(index=index, columns=columns)
temp_df
| total_cases | new_cases | total_deaths | new_deaths | aged_65_older | aged_70_older | gdp_per_capita | diabetes_prevalence | female_smokers | male_smokers | hospital_beds_per_thousand | temp | forecast | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2021-03-21 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-22 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-23 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-24 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-25 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-26 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-27 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-28 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-29 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-03-31 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-02 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-03 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-04 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-05 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-06 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-07 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-08 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-09 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-12 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-13 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-14 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-16 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-17 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2021-04-19 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
j=0
for i in forecast_prediction:
temp_df.iat[j,12] = i
j= j+1
temp_df
| total_cases | new_cases | total_deaths | new_deaths | aged_65_older | aged_70_older | gdp_per_capita | diabetes_prevalence | female_smokers | male_smokers | hospital_beds_per_thousand | temp | forecast | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2021-03-21 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.76137e+06 |
| 2021-03-22 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.74244e+06 |
| 2021-03-23 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.71673e+06 |
| 2021-03-24 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.72389e+06 |
| 2021-03-25 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.78875e+06 |
| 2021-03-26 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.82109e+06 |
| 2021-03-27 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.81992e+06 |
| 2021-03-28 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.8306e+06 |
| 2021-03-29 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.81076e+06 |
| 2021-03-30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.77617e+06 |
| 2021-03-31 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.79596e+06 |
| 2021-04-01 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.82793e+06 |
| 2021-04-02 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.87801e+06 |
| 2021-04-03 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.88305e+06 |
| 2021-04-04 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.89307e+06 |
| 2021-04-05 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.8776e+06 |
| 2021-04-06 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.85557e+06 |
| 2021-04-07 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.85409e+06 |
| 2021-04-08 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.91518e+06 |
| 2021-04-09 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.93709e+06 |
| 2021-04-10 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.95024e+06 |
| 2021-04-11 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.96252e+06 |
| 2021-04-12 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.95658e+06 |
| 2021-04-13 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.91668e+06 |
| 2021-04-14 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.92907e+06 |
| 2021-04-15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2.99263e+06 |
| 2021-04-16 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.02101e+06 |
| 2021-04-17 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.03862e+06 |
| 2021-04-18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.04143e+06 |
| 2021-04-19 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.0284e+06 |
#Append the forcasted - Initially did it for easness but kater decided to use xgboost also
Data_For_Regression['total_deaths'].plot(figsize=(15,6), color="red")
temp_df['forecast'].plot(figsize=(15,6), color="orange")
plt.xlabel('Date')
plt.ylabel('Death')
plt.show()
# XGboost algorithm to see if we can get better results
xgb_model = xgb.XGBRegressor(objective ='reg:squarederror',colsample_bytree=0.4,
gamma=0,
learning_rate=0.07,
max_depth=3,
min_child_weight=1.5,
n_estimators=10000,
reg_alpha=0.75,
reg_lambda=0.45,
subsample=0.6)
traindf, testdf = train_test_split(X_train, test_size = 0.2)
xgb_model.fit(X_train,y_train)
xgforecast_prediction = xgb_model.predict(X_forecast_out)
xgforecast_prediction
array([2704656.8, 2664913.5, 2536690.5, 2613490.5, 2718172.5, 2689632. ,
2673268.8, 2692658.8, 2692065.8, 2523291.5, 2579734.8, 2612704.2,
2710746.8, 2683083.2, 2674768.8, 2685463.5, 2556501.8, 2596244.8,
2673547. , 2670320.2, 2676758.2, 2687230.2, 2673305.8, 2559467. ,
2608374.2, 2668408.5, 2656320.8, 2672826.5, 2676134.5, 2675404.8],
dtype=float32)
#Setting the temperory df with XGboost forecasted data
j=0
for i in xgforecast_prediction:
temp_df.iat[j,12] = i
j= j+1
Data_For_Regression['total_deaths'].plot(figsize=(15,6), color="red")
temp_df['forecast'].plot(figsize=(15,6), color="orange")
plt.xlabel('Date')
plt.ylabel('Death')
plt.show()
It decided on using Population Over Age 65 and diabetes_prevalence cardiovasc_death_rate because in the world, over 80% of the deaths were in the population 65 and over, and the CDC has stated that 94% of deaths had some underlying health condition. We also used Life Expectancy per country to account for possible deficiencies in the health care system. John Hopkins University has listed several diseases such as heart disease and Diabetes which are known to be exacerbated by cardiovasc_death_rate and Obesity. Our idea is that we can more accurately predict the Mortality Ratio of COVID-19 by using both population 65 and over and Obesity rather than just population 65 and over. This may show that creating a healthier population is the best way to prevent the devastation in future pandemics that the world is currently facing
After viewing the graphs in Linear Regression-Forecast we the accuracy that XGboost algorithms can achieve with this data. . We will continue and see if our ML Algorithm can do better than we are expecting. We have initially chosen to use categorization with the HighRisk category as that may be more accurate than regression. Or can we use more precise algorithms to build a data-appropriate learning model?
covid_df_copy = covid_df.copy()
covid_df_copy.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 76215 entries, 0 to 76214 Data columns (total 61 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 iso_code 76215 non-null object 1 continent 72473 non-null object 2 location 76215 non-null object 3 date 76215 non-null object 4 total_cases 76215 non-null float64 5 new_cases 76215 non-null float64 6 new_cases_smoothed 76215 non-null float64 7 total_deaths 76215 non-null float64 8 new_deaths 76215 non-null float64 9 new_deaths_smoothed 76215 non-null float64 10 total_cases_per_million 76215 non-null float64 11 new_cases_per_million 76215 non-null float64 12 new_cases_smoothed_per_million 76215 non-null float64 13 total_deaths_per_million 76215 non-null float64 14 new_deaths_per_million 76215 non-null float64 15 new_deaths_smoothed_per_million 76215 non-null float64 16 reproduction_rate 76215 non-null float64 17 icu_patients 76215 non-null float64 18 icu_patients_per_million 76215 non-null float64 19 hosp_patients 76215 non-null float64 20 hosp_patients_per_million 76215 non-null float64 21 weekly_icu_admissions 76215 non-null float64 22 weekly_icu_admissions_per_million 76215 non-null float64 23 weekly_hosp_admissions 76215 non-null float64 24 weekly_hosp_admissions_per_million 76215 non-null float64 25 new_tests 76215 non-null float64 26 total_tests 76215 non-null float64 27 total_tests_per_thousand 76215 non-null float64 28 new_tests_per_thousand 76215 non-null float64 29 new_tests_smoothed 76215 non-null float64 30 new_tests_smoothed_per_thousand 76215 non-null float64 31 positive_rate 76215 non-null float64 32 tests_per_case 76215 non-null float64 33 tests_units 40920 non-null object 34 total_vaccinations 76215 non-null float64 35 people_vaccinated 76215 non-null float64 36 people_fully_vaccinated 76215 non-null float64 37 new_vaccinations 76215 non-null float64 38 new_vaccinations_smoothed 76215 non-null float64 39 total_vaccinations_per_hundred 76215 non-null float64 40 people_vaccinated_per_hundred 76215 non-null float64 41 people_fully_vaccinated_per_hundred 76215 non-null float64 42 new_vaccinations_smoothed_per_million 76215 non-null float64 43 stringency_index 76215 non-null float64 44 population 76215 non-null float64 45 population_density 76215 non-null float64 46 median_age 76215 non-null float64 47 aged_65_older 76215 non-null float64 48 aged_70_older 76215 non-null float64 49 gdp_per_capita 76215 non-null float64 50 extreme_poverty 76215 non-null float64 51 cardiovasc_death_rate 76215 non-null float64 52 diabetes_prevalence 76215 non-null float64 53 female_smokers 76215 non-null float64 54 male_smokers 76215 non-null float64 55 handwashing_facilities 76215 non-null float64 56 hospital_beds_per_thousand 76215 non-null float64 57 life_expectancy 76215 non-null float64 58 human_development_index 76215 non-null float64 59 death_rate 68898 non-null float64 60 population_coverage 75798 non-null float64 dtypes: float64(56), object(5) memory usage: 35.5+ MB
#it decided to Create a column the High Risk and base it off of total_deaths_per_million which is the Total deaths attributed to COVID-19 per 1,000,000 people
covid_df_copy['HighRisk'] = zscore(covid_df_copy['total_deaths_per_million']) > 0.65
covid_df_copy.head(20)
| iso_code | continent | location | date | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | ... | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | death_rate | population_coverage | HighRisk | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Asia | Afghanistan | 2020-02-24 | 1.0 | 1.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 1 | AFG | Asia | Afghanistan | 2020-02-25 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 2 | AFG | Asia | Afghanistan | 2020-02-26 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 3 | AFG | Asia | Afghanistan | 2020-02-27 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 4 | AFG | Asia | Afghanistan | 2020-02-28 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 5 | AFG | Asia | Afghanistan | 2020-02-29 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 6 | AFG | Asia | Afghanistan | 2020-03-01 | 1.0 | 0.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 7 | AFG | Asia | Afghanistan | 2020-03-02 | 1.0 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | NaN | 0.0 | False |
| 8 | AFG | Asia | Afghanistan | 2020-03-03 | 2.0 | 1.0 | 0.143 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 9 | AFG | Asia | Afghanistan | 2020-03-04 | 4.0 | 2.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 10 | AFG | Asia | Afghanistan | 2020-03-05 | 4.0 | 0.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 11 | AFG | Asia | Afghanistan | 2020-03-06 | 4.0 | 0.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 12 | AFG | Asia | Afghanistan | 2020-03-07 | 4.0 | 0.0 | 0.429 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 13 | AFG | Asia | Afghanistan | 2020-03-08 | 5.0 | 1.0 | 0.571 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 14 | AFG | Asia | Afghanistan | 2020-03-09 | 7.0 | 2.0 | 0.857 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 15 | AFG | Asia | Afghanistan | 2020-03-10 | 8.0 | 1.0 | 0.857 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 16 | AFG | Asia | Afghanistan | 2020-03-11 | 11.0 | 3.0 | 1.000 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 17 | AFG | Asia | Afghanistan | 2020-03-12 | 12.0 | 1.0 | 1.143 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 18 | AFG | Asia | Afghanistan | 2020-03-13 | 13.0 | 1.0 | 1.286 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
| 19 | AFG | Asia | Afghanistan | 2020-03-14 | 15.0 | 2.0 | 1.571 | 0.0 | 0.0 | 0.0 | ... | 9.59 | 0.0 | 0.0 | 37.746 | 0.5 | 64.83 | 0.511 | 0.0 | 0.0 | False |
20 rows × 62 columns
corr = covid_df_copy[['death_rate', 'total_deaths_per_million', 'aged_65_older','extreme_poverty', 'icu_patients', 'life_expectancy', 'cardiovasc_death_rate', 'diabetes_prevalence', 'human_development_index', 'population_density', 'aged_70_older', 'population_coverage']].corr()
corr.style.background_gradient(cmap='coolwarm')
#Total deaths attributed to COVID-19 per 1,000,000 people
#With the new data the correlations seem stronger on aged_70_older, 'cardiovasc_death_rate', 'diabetes_prevalence', 'human_development_index', 'population_density', 'aged_70_older',and 'population_coverage' than before.
| death_rate | total_deaths_per_million | aged_65_older | extreme_poverty | icu_patients | life_expectancy | cardiovasc_death_rate | diabetes_prevalence | human_development_index | population_density | aged_70_older | population_coverage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| death_rate | 1.000000 | 0.011712 | 0.019838 | -0.004740 | 0.002655 | 0.001072 | 0.007339 | -0.016454 | -0.005875 | 0.000378 | 0.022357 | -0.027813 |
| total_deaths_per_million | 0.011712 | 1.000000 | 0.261580 | -0.190581 | 0.209748 | 0.087003 | -0.210793 | -0.044598 | 0.131226 | -0.024086 | 0.264115 | 0.284074 |
| aged_65_older | 0.019838 | 0.261580 | 1.000000 | -0.314490 | 0.152195 | 0.495638 | -0.061668 | 0.077630 | 0.690167 | -0.038343 | 0.965808 | 0.233574 |
| extreme_poverty | -0.004740 | -0.190581 | -0.314490 | 1.000000 | -0.042952 | -0.145519 | 0.191294 | -0.273647 | -0.253455 | -0.055936 | -0.315749 | -0.129305 |
| icu_patients | 0.002655 | 0.209748 | 0.152195 | -0.042952 | 1.000000 | 0.062059 | -0.068656 | 0.025804 | 0.101677 | -0.012730 | 0.154094 | 0.124600 |
| life_expectancy | 0.001072 | 0.087003 | 0.495638 | -0.145519 | 0.062059 | 1.000000 | 0.170786 | 0.436338 | 0.751472 | 0.123856 | 0.485926 | 0.167732 |
| cardiovasc_death_rate | 0.007339 | -0.210793 | -0.061668 | 0.191294 | -0.068656 | 0.170786 | 1.000000 | 0.244890 | 0.152652 | -0.179121 | -0.098685 | -0.098590 |
| diabetes_prevalence | -0.016454 | -0.044598 | 0.077630 | -0.273647 | 0.025804 | 0.436338 | 0.244890 | 1.000000 | 0.461805 | 0.026118 | 0.018650 | 0.075327 |
| human_development_index | -0.005875 | 0.131226 | 0.690167 | -0.253455 | 0.101677 | 0.751472 | 0.152652 | 0.461805 | 1.000000 | -0.128936 | 0.638597 | 0.249810 |
| population_density | 0.000378 | -0.024086 | -0.038343 | -0.055936 | -0.012730 | 0.123856 | -0.179121 | 0.026118 | -0.128936 | 1.000000 | -0.047520 | 0.002514 |
| aged_70_older | 0.022357 | 0.264115 | 0.965808 | -0.315749 | 0.154094 | 0.485926 | -0.098685 | 0.018650 | 0.638597 | -0.047520 | 1.000000 | 0.225655 |
| population_coverage | -0.027813 | 0.284074 | 0.233574 | -0.129305 | 0.124600 | 0.167732 | -0.098590 | 0.075327 | 0.249810 | 0.002514 | 0.225655 | 1.000000 |
predictors = ['diabetes_prevalence','icu_patients','life_expectancy', 'cardiovasc_death_rate', 'human_development_index', 'aged_70_older', 'population_density', 'female_smokers', 'male_smokers', 'extreme_poverty']
target = 'HighRisk'
X = covid_df_copy[predictors].values
y = covid_df_copy[target].values
# Split the data into training and test sets, and scale
scaler = StandardScaler()
# unscaled version (note that scaling is only used on predictor variables)
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
# scaled version
X_train = scaler.fit_transform(X_train_raw)
X_test = scaler.transform(X_test_raw)
print('First 10 Rows of Scaled Data: \n\n', X_train[0:10:,], '\n')
knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
predictions = knn.predict(X_test)
accuracy = (predictions == y_test).mean()
print('Accuracy:', round(accuracy * 100, 2), '%')
First 10 Rows of Scaled Data: [[ 2.16970375e+00 -9.92836241e-02 4.45157032e-01 -6.32460446e-01 7.48536558e-01 -8.31170980e-01 1.00417221e+00 -1.78831296e-01 7.75762571e-01 -4.86384094e-01] [ 2.22119711e+00 -9.92836241e-02 3.80613143e-01 1.84617450e-01 5.78461190e-01 -3.67873461e-01 -1.38292306e-01 -6.58327027e-01 1.03379387e+00 -4.80475136e-01] [ 1.23814206e+00 -9.92836241e-02 1.24722329e-01 1.64963247e-01 2.86903415e-01 -1.79343481e-01 -1.95825003e-01 -1.89993855e-02 1.06067213e+00 -4.86384094e-01] [-1.46525932e+00 -9.92836241e-02 -4.41321860e-01 -3.05476685e-03 -4.94633396e-01 -7.03878840e-01 -1.36518668e-01 -6.98285004e-01 -5.84277375e-01 2.44445898e+00] [-7.49386104e-02 -9.92836241e-02 4.80570493e-01 -5.65009433e-01 6.06807084e-01 1.22706265e+00 -1.85802333e-01 6.40307245e-01 -1.75727826e-01 -4.80475136e-01] [ 5.44889451e+00 -9.92836241e-02 2.40101670e-01 2.40469736e+00 1.49223355e-01 -1.14928665e+00 -1.47735658e-02 -7.58221971e-01 -1.24548257e+00 -4.86384094e-01] [-4.04964233e-01 -9.92836241e-02 -3.06522235e-01 -2.68311729e-01 1.69470423e-01 -4.49065205e-01 -1.69032883e-01 5.09270753e-02 5.39233885e-01 6.30408932e-01] [-2.17715653e-01 -9.92836241e-02 1.27007069e-01 7.97293125e-01 2.05915145e-01 -4.49065205e-01 -1.07569740e-01 -4.78516128e-01 2.84538858e+00 -1.49573499e-01] [ 1.14217716e+00 -9.92836241e-02 4.68004426e-01 -4.85906499e-01 6.18955325e-01 1.14800514e-02 -1.32914258e-01 6.50296739e-01 9.63910390e-01 -4.74566179e-01] [-1.17422146e-02 -9.92836241e-02 -2.81390101e-01 8.06492010e-01 -3.77200403e-01 -6.76126860e-01 -1.88501293e-01 -5.88400566e-01 1.56598341e+00 1.69993030e+00]] Accuracy: 90.19 %
n = 8
accuracies = []
ks = np.arange(1, n+1, 2)
for k in ks:
print(k, ' ', end='')
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
predictions = knn.predict(X_test)
acc = (predictions == y_test).mean()
accuracies.append(acc)
print('done')
def get_best(ks, accuracies):
maximum = np.array(accuracies).max()
indexMax = np.where(accuracies == maximum)
return ks[indexMax], maximum
best_k, best_acc = get_best(ks, accuracies)
print('best k = {}, best accuracy: {:0.3f}%'.format(best_k, best_acc * 100))
1 3 5 7 done best k = [7], best accuracy: 90.518%
print('Comparison of predictions to y_test values: \n\n', predictions == y_test)
print('\nPredictions:\n\n', predictions)
print('\nY_test values:\n\n', y_test)
Comparison of predictions to y_test values: [ True True True ... True True True] Predictions: [ True False False ... False False False] Y_test values: [ True False False ... False False False]
A further look at our predictions and Y_test values show that we get 90.226% simply by predicting almost everything as False so this model's features and data should be improved
predictors = ['diabetes_prevalence','icu_patients','life_expectancy', 'cardiovasc_death_rate','human_development_index', 'aged_70_older', 'population_density', 'female_smokers', 'male_smokers', 'extreme_poverty']
target = 'HighRisk'
X = covid_df_copy[predictors].values
y = covid_df_copy[target].values
# Split the data into training and test sets, and scale
scaler = StandardScaler()
# unscaled version (note that scaling is only used on predictor variables)
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
# scaled version
X_train = scaler.fit_transform(X_train_raw)
X_test = scaler.transform(X_test_raw)
ks = np.arange(1, 8, 2)
for k in ks:
best_acc = 0
selected = []
remaining = list(range(X_train.shape[1]))
n = 11
better = True
while len(selected) < n and better == True:
# find the single features that works best in conjunction
# with the already selected features
acc_max = 0
for i in remaining:
# make a version of the training data with just selected, feature i
selectedFi = selected.copy()
selectedFi.append(i)
X_si = X_train[:,selectedFi]
y_siTrain = y_train[~np.isnan(X_si).any(axis=1)]
X_si=X_si[~np.isnan(X_si).any(axis=1)]
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_si, y_siTrain)
X_testSi = X_test[:,selectedFi]
y_siTest = y_test[~np.isnan(X_testSi).any(axis=1)]
X_testSi = X_testSi[~np.isnan(X_testSi).any(axis=1)]
predictions = knn.predict(X_testSi)
acc = (predictions == y_siTest).mean()
if (acc > acc_max):
acc_max = acc
i_min = i
if (best_acc < acc):
best_acc = acc
better = True
else:
better = False
if (better == True):
remaining.remove(i_min)
selected.append(i_min)
print('k: {}; num features: {}; features: {}; bestAcc: {:.2f}%'.format(k, len(selected), [predictors[x] for x in selected], best_acc*100))
k: 1; num features: 1; features: ['icu_patients']; bestAcc: 86.57% k: 1; num features: 2; features: ['icu_patients', 'male_smokers']; bestAcc: 87.74% k: 1; num features: 3; features: ['icu_patients', 'male_smokers', 'extreme_poverty']; bestAcc: 88.32% k: 1; num features: 4; features: ['icu_patients', 'male_smokers', 'extreme_poverty', 'life_expectancy']; bestAcc: 88.42% k: 1; num features: 5; features: ['icu_patients', 'male_smokers', 'extreme_poverty', 'life_expectancy', 'aged_70_older']; bestAcc: 88.46% k: 3; num features: 1; features: ['icu_patients']; bestAcc: 86.88% k: 3; num features: 2; features: ['icu_patients', 'life_expectancy']; bestAcc: 89.28% k: 3; num features: 3; features: ['icu_patients', 'life_expectancy', 'female_smokers']; bestAcc: 89.80% k: 3; num features: 4; features: ['icu_patients', 'life_expectancy', 'female_smokers', 'human_development_index']; bestAcc: 89.83% k: 3; num features: 5; features: ['icu_patients', 'life_expectancy', 'female_smokers', 'human_development_index', 'population_density']; bestAcc: 90.09% k: 5; num features: 1; features: ['life_expectancy']; bestAcc: 87.97% k: 5; num features: 2; features: ['life_expectancy', 'icu_patients']; bestAcc: 89.88% k: 5; num features: 3; features: ['life_expectancy', 'icu_patients', 'female_smokers']; bestAcc: 90.33% k: 7; num features: 1; features: ['population_density']; bestAcc: 88.38% k: 7; num features: 2; features: ['population_density', 'icu_patients']; bestAcc: 89.67% k: 7; num features: 3; features: ['population_density', 'icu_patients', 'female_smokers']; bestAcc: 90.22% k: 7; num features: 4; features: ['population_density', 'icu_patients', 'female_smokers', 'human_development_index']; bestAcc: 90.53%
population_density', 'icu_patients', 'female_smokers', 'human_development_index'
# change default plot size
rcParams['figure.figsize'] = 10,8
sns.scatterplot(data=covid_df_copy, x='male_smokers', y='cardiovasc_death_rate', hue='HighRisk', style='HighRisk');
plt.xlabel("Cardiovascular Death Rate")
plt.ylabel("Male Smokers");
sns.scatterplot(data=covid_df_copy, x='life_expectancy', y='diabetes_prevalence', hue='HighRisk', style='HighRisk')
plt.xlabel("Life Expectancy")
plt.ylabel("% of Obesity");
sns.scatterplot(data=covid_df_copy, x='human_development_index', y='positive_rate', hue='HighRisk', style='HighRisk');
plt.xlabel("Human Development Index")
plt.ylabel("positive_rate");
sns.scatterplot(data=covid_df_copy, x='aged_65_older', y='diabetes_prevalence', hue='HighRisk', style='HighRisk');
plt.xlabel("% Age 65 and older")
plt.ylabel("% of diabetes_prevalence");
k = 7
predictors = ['diabetes_prevalence','icu_patients','female_smokers','male_smokers', 'human_development_index', 'life_expectancy','cardiovasc_death_rate','positive_rate']
target = 'HighRisk'
# unscaled version (note that scaling is only used on predictor variables)
X = covid_df_copy[predictors].values
y = covid_df_copy[target].values
# Split the data into training and test sets, and scale
scaler = StandardScaler()
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
# scaled version
X_train = scaler.fit_transform(X_train_raw)
X_test = scaler.transform(X_test_raw)
#Remove data rows with nan
y_train = y_train[~np.isnan(X_train).any(axis=1)]
X_train = X_train[~np.isnan(X_train).any(axis=1)]
y_test = y_test[~np.isnan(X_test).any(axis=1)]
X_test = X_test[~np.isnan(X_test).any(axis=1)]
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
predictions = knn.predict(X_test)
acc = (predictions == y_test).mean()
print('Comparison of predictions to y_test values: \n\n', predictions == y_test)
print('\nPredictions:\n\n', predictions)
print('\nY_test values:\n\n', y_test)
print('\nAccurancy: ', acc)
Comparison of predictions to y_test values: [ True True True ... True True True] Predictions: [ True False False ... False False False] Y_test values: [ True False False ... False False False] Accurancy: 0.937109118740433
Based on the analysis, there are several things we can conclude.
1- The United States and India are more affected by Covid 19, and the number of injuries and deaths is very high. Nevertheless, it is the first in the world to be vaccinated from the Corona vaccine
2-Utilizing a High Risk classification target that was determined via those countries whose Covid Death per million population was greater than 0.65 std deviations representing about 25% of the countries, we evaluated over 12 features provided in the dataset using kNN and a range of k values. The following four features showed an accuracy of about 93 % using kNN with k=7
3- It's got accuracy :0.9976567640125802 from Linear Regression-Forecast Fitted Predicted the Death rate for next 30 days with independent variable like age, gdp, diabetes, smokers, hospital beds etc.
4- Although for most people COVID-19 causes only mild illness, it can make some people very ill. More rarely, the disease can be fatal. Older people, and those with pre- existing medical conditions (such as high blood pressure, heart problems or diabetes) appear to be more vulnerable.
5- COVID-19 The Government Response Stringency Index n the dataset the Government Response Stringency Index is a composite measure based on nine response indicators including school closures, workplace closures, and travel bans, rescaled to a value from 0 to 100 (100 = strictest response).
6- we were able to create several visualizations in the Jupyter Notebook with scatterplots comparing the different features to the High Risk Category that we found to produce the best model. Age, and obesity do seem to be factors in the mortality rate with extra features such as smoking, cardiovascular disease helping improve the numbers further.
7- Commitment to virus prevention tools, especially in densely populated cities Because differences in the population size between countries are often large, and the COVID-19 death count in more populous countries tends to be higher. Because of this it can be insightful to know how the number of confirmed deaths in a country compares to the number of people who live there, especially when comparing across countries.
1- Recommendation Assuming the published data are reliable, the SIR model can be applied to assess the spread of the COVID-19 disease and predict the number of infected, removed and recovered populations and deaths in the communities, accommodating at the same time possible surges in the number of susceptible individuals
2- Preferably exclude vaccine columns, analyzing.
3- It collects separate vaccine data, analyzes it, and creates a prediction model for daily and monthly vaccination
1- https://www.geeksforgeeks.org/python-programming-language/?ref=leftbar
2- https://www.python-course.eu/python3_class_and_instance_attributes.php
3- https://thispointer.com/data-analysis-in-python-using-pandas/
4- https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas
5- https://ourworldindata.org/coronavirus